Hi devs,
I recently updated the Bixo project to use Tika 0.8-SNAPSHOT, and a
number of documents now fail during parsing that previously passed.
Many of these failures seem related to image processing. For example:
Caused by: org.apache.tika.exception.TikaException: Can't read JPEG
metadata
at
org
.apache
.tika
.parser
.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:71)
at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:
163)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:110)
at bixo.parser.TikaCallable.call(TikaCallable.java:63)
at bixo.parser.TikaCallable.call(TikaCallable.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.lang.Thread.run(Thread.java:637)
Caused by: com.drew.imaging.jpeg.JpegProcessingException: not a jpeg
file
at com.drew.imaging.jpeg.JpegSegmentReader.readSegments(Unknown Source)
at com.drew.imaging.jpeg.JpegSegmentReader.<init>(Unknown Source)
at com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(Unknown
Source)
at
org
.apache
.tika
.parser
.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:67)
... 8 more
Did the Tika-0.7 image parsers (JPEG, GIF, PNG) not extract metadata,
and thus not run into these types of issues?
Thanks,
-- Ken
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g