Hi devs,

I recently updated the Bixo project to use Tika 0.8-SNAPSHOT, and a number of documents now fail during parsing that previously passed.

Many of these failures seem related to image processing. For example:

Caused by: org.apache.tika.exception.TikaException: Can't read JPEG metadata at org .apache .tika .parser .image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:71)
        at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 163) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:110)
        at bixo.parser.TikaCallable.call(TikaCallable.java:63)
        at bixo.parser.TikaCallable.call(TikaCallable.java:1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.lang.Thread.run(Thread.java:637)
Caused by: com.drew.imaging.jpeg.JpegProcessingException: not a jpeg file
        at com.drew.imaging.jpeg.JpegSegmentReader.readSegments(Unknown Source)
        at com.drew.imaging.jpeg.JpegSegmentReader.<init>(Unknown Source)
at com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(Unknown Source) at org .apache .tika .parser .image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:67)
        ... 8 more

Did the Tika-0.7 image parsers (JPEG, GIF, PNG) not extract metadata, and thus not run into these types of issues?

Thanks,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Reply via email to