On Tue, Sep 7, 2010 at 10:43 AM, Nick Burch <[email protected]> wrote: > On Mon, 6 Sep 2010, Ken Krugler wrote: >> >> I recently updated the Bixo project to use Tika 0.8-SNAPSHOT, and a number >> of documents now fail during parsing that previously passed. > > Any chance you could create a new jira issue, and upload one of the problem > documents? > >> Did the Tika-0.7 image parsers (JPEG, GIF, PNG) not extract metadata, and >> thus not run into these types of issues? > > The image metadata stuff has changed dramatically since 0.7, and we're now > processing a lot more of the files in search of useful metadata than we used > to. >
The exception is thrown before we start to extract the metadata. It looks like the file is auto detected as a Jpeg but the EXIF parser (the same version that Tika has used for a long time) says it is not a Jpeg. Please attach one of the failing files to the issue. /Staffan
