Hello
I thought that adding an engine that extract XMP metadata and converts EXIF
data to XMP would be pretty straight forward (expecially since clerezza
provides a bundle with such utilities).
However I've noticed that the tika engina already processes jpegs but for
the jpeg I've been testing it I get:
<h3>Caused
by:</h3><pre>org.apache.stanbol.enhancer.servicesapi.EngineException:
Unable to convert ContentItem
<urn:content-item-sha1-13b7a6ca2636d1e1e8d36b4bc69d623947a6acb7> with
mimeType 'image/jpeg' to plain text!
at
org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:222)
at
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:259)
at
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:181)
at
org.apache.felix.eventadmin.impl.tasks.HandlerTaskImpl.execute(HandlerTaskImpl.java:88)
at
org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:221)
at
org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:110)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.tika.exception.TikaException: Can't read JPEG metadata
at
org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:104)
at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:220)
... 7 more
Caused by: com.drew.imaging.jpeg.JpegProcessingException: segment size
would extend beyond file stream length
at com.drew.imaging.jpeg.JpegSegmentReader.readSegments(Unknown Source)
at com.drew.imaging.jpeg.JpegSegmentReader.<init>(Unknown Source)
at
org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:94)
... 13 more
</pre>
<h3>Caused by:</h3><pre>org.apache.tika.exception.TikaException: Can't read
JPEG metadata
at
org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:104)
at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
Now its not surprising that a jpeg cannot be converted to plain text but
why does tika attempts in the first place andy why can't the JPEG metadata
be read?
Any ideas?
Cheers,
Reto