[ 
https://issues.apache.org/jira/browse/STANBOL-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583113#comment-13583113
 ] 

Rupert Westenthaler commented on STANBOL-810:
---------------------------------------------

Finally had some time to look into this. I think this is related to the Apache 
Tika bundle defining

    javax.xml.stream; resolution:=optional; version="[1.0, 2)", 
    javax.xml.stream.events; resolution:=optional; version="[1.0, 2)", 
    javax.xml.stream.util; resolution:=optional; version="[1.0, 2)"

in its Import-Package definitions. However the Stanbol framework fragment 
exports those packages without an version information - as this is done for all 
packages originating from the Java runtime, because one can not know the 
version used by the actual Java runtime.

For packages with no version number OSGI assumes the version 0.0.0 and this is 
outside the acceptable range of the Apache Tika bundle. Because of this those 
packages are not imported by Tika was may break parsing of all XML based 
document formats.

IMO it would be the best if the Tika Bundle would remove the version range for 
those imports. java.xml.stream is part of the Java runtime since Java SE 6 so 
in most of the use cases it will be provided by the system. However as an 
immediate solution the Stanbol framework fragment could also define a suitable 
version for those packages.

The release notes of Java 6 JAXP [1] note that it provides specification 
version 1.4. Because of that defining 1.4.0 as version for those packages seams 
suitable for Stanbol.


                
> TIKA causes java.lang.NoClassDefFoundError with org.apache.xmlbeans.XmlBeans 
> on some plattforms when processing .docx files
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-810
>                 URL: https://issues.apache.org/jira/browse/STANBOL-810
>             Project: Stanbol
>          Issue Type: Bug
>          Components: Engine - Tika
>         Environment: This error DOES NOT appear on a Stanbol Server running 
> on Linux using
>     java version "1.6.0_26"
>     Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
>     Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
> JDK 1.7 seams to be affected. Further information needed/requested
>            Reporter: Rupert Westenthaler
>         Attachments: response.txt
>
>
> This was first reported by Dr Andriy Nikolov in 
> http://markmail.org/message/x4y4n5drty56zxtq
> Users affected by that will notice an Exception with the following cause
> <h3>Caused by:</h3><pre>java.lang.NoClassDefFoundError: Could not initialize 
> class org.apache.xmlbeans.XmlBeans
>       at 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
>  Source)
>       at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134)
>       at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
>       at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.&lt;init&gt;(XWPFDocument.java:116)
>       at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.&lt;init&gt;(XWPFWordExtractor.java:53)
>       at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at 
> org.apache.stanbol.enhancer.engines.tika.TikaEngine.computeEnhancements(TikaEngine.java:220)
>       at 
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:259)
>       at 
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:181)
>       at 
> org.apache.felix.eventadmin.impl.tasks.HandlerTaskImpl.execute(HandlerTaskImpl.java:88)
>       at 
> org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:221)
>       at 
> org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:110)
>       at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown 
> Source)
>       at java.lang.Thread.run(Thread.java:722)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to