On Tue, Sep 7, 2010 at 10:43 AM, Nick Burch <[email protected]> wrote:
> On Mon, 6 Sep 2010, Ken Krugler wrote:
>>
>> I recently updated the Bixo project to use Tika 0.8-SNAPSHOT, and a number
>> of documents now fail during parsing that previously passed.
>
> Any chance you could create a new jira issue, and upload one of the problem
> documents?
>
>> Did the Tika-0.7 image parsers (JPEG, GIF, PNG) not extract metadata, and
>> thus not run into these types of issues?
>
> The image metadata stuff has changed dramatically since 0.7, and we're now
> processing a lot more of the files in search of useful metadata than we used
> to.
>

The exception is thrown before we start to extract the metadata. It
looks like the file is auto detected as a Jpeg but the EXIF parser
(the same version that Tika has used for a long time) says it is not a
Jpeg. Please attach one of the failing files to the issue.

/Staffan

Reply via email to