[
https://issues.apache.org/jira/browse/STANBOL-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583113#comment-13583113
]
Rupert Westenthaler commented on STANBOL-810:
---------------------------------------------
Finally had some time to look into this. I think this is related to the Apache
Tika bundle defining
javax.xml.stream; resolution:=optional; version="[1.0, 2)",
javax.xml.stream.events; resolution:=optional; version="[1.0, 2)",
javax.xml.stream.util; resolution:=optional; version="[1.0, 2)"
in its Import-Package definitions. However the Stanbol framework fragment
exports those packages without an version information - as this is done for all
packages originating from the Java runtime, because one can not know the
version used by the actual Java runtime.
For packages with no version number OSGI assumes the version 0.0.0 and this is
outside the acceptable range of the Apache Tika bundle. Because of this those
packages are not imported by Tika was may break parsing of all XML based
document formats.
IMO it would be the best if the Tika Bundle would remove the version range for
those imports. java.xml.stream is part of the Java runtime since Java SE 6 so
in most of the use cases it will be provided by the system. However as an
immediate solution the Stanbol framework fragment could also define a suitable
version for those packages.
The release notes of Java 6 JAXP [1] note that it provides specification
version 1.4. Because of that defining 1.4.0 as version for those packages seams
suitable for Stanbol.
> TIKA causes java.lang.NoClassDefFoundError with org.apache.xmlbeans.XmlBeans
> on some plattforms when processing .docx files
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: STANBOL-810
> URL: https://issues.apache.org/jira/browse/STANBOL-810
> Project: Stanbol
> Issue Type: Bug
> Components: Engine - Tika
> Environment: This error DOES NOT appear on a Stanbol Server running
> on Linux using
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
> JDK 1.7 seams to be affected. Further information needed/requested
> Reporter: Rupert Westenthaler
> Attachments: response.txt
>
>
> This was first reported by Dr Andriy Nikolov in
> http://markmail.org/message/x4y4n5drty56zxtq
> Users affected by that will notice an Exception with the following cause
> <h3>Caused by:</h3><pre>java.lang.NoClassDefFoundError: Could not initialize
> class org.apache.xmlbeans.XmlBeans
> at
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
> Source)
> at
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134)
> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
> at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:116)
> at
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:53)
> at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:220)
> at
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:259)
> at
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:181)
> at
> org.apache.felix.eventadmin.impl.tasks.HandlerTaskImpl.execute(HandlerTaskImpl.java:88)
> at
> org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:221)
> at
> org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:110)
> at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown
> Source)
> at java.lang.Thread.run(Thread.java:722)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira