[
https://issues.apache.org/jira/browse/TIKA-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638883#comment-13638883
]
Niels Beekman commented on TIKA-1111:
-------------------------------------
This was not enough however for the parser, as it was missing an import for the
org.w3c.dom package:
java.lang.NoClassDefFoundError
at org.apache.xmlbeans.XmlBeans.class$(XmlBeans.java:43)
at org.apache.xmlbeans.XmlBeans.buildNodeMethod(XmlBeans.java:195)
at
org.apache.xmlbeans.XmlBeans.buildNodeToCursorMethod(XmlBeans.java:232)
at org.apache.xmlbeans.XmlBeans.<clinit>(XmlBeans.java:131)
at
org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
Source)
at
org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at
org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:116)
at
org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:53)
at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:221)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: org.w3c.dom.Node
at
org.apache.felix.framework.ModuleImpl.findClassOrResourceByDelegation(ModuleImpl.java:814)
at org.apache.felix.framework.ModuleImpl.access$100(ModuleImpl.java:61)
at
org.apache.felix.framework.ModuleImpl$ModuleClassLoader.loadClass(ModuleImpl.java:1733)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
... 18 more
This appears to be the same issue as TIKA-1086
> Class loading issues when running in OSGi environment
> -----------------------------------------------------
>
> Key: TIKA-1111
> URL: https://issues.apache.org/jira/browse/TIKA-1111
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.3
> Environment: Tika 1.3 (tika-core and tika-bundle OSGi bundles)
> Felix 2.0.5
> Reporter: Niels Beekman
>
> When dom4j is on the system classpath, a class loading error occurs during
> detection of Office Open XML files:
> java.lang.ExceptionInInitializerError
> at
> org.apache.poi.openxml4j.opc.internal.unmarshallers.PackagePropertiesUnmarshaller.<clinit>(PackagePropertiesUnmarshaller.java:49)
> at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154)
> at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141)
> at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
> at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99)
> at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207)
> at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectOfficeOpenXML(ZipContainerDetector.java:194)
> at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:134)
> at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:77)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
> at
> org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:221)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.ClassCastException: org.dom4j.DocumentFactory cannot be
> cast to org.dom4j.DocumentFactory
> at org.dom4j.DocumentFactory.getInstance(DocumentFactory.java:97)
> at org.dom4j.tree.AbstractNode.<clinit>(AbstractNode.java:39)
> ... 14 more
> As a workaround (maybe a solution), I modified the context classloader when
> running the detection (wrapped the detector and parser). This appears to be
> the common fix for dom4j, as it uses the context classloader during
> initialization. Ideally, the detectors and parsers would be running with
> their original loader (from ServiceLoader) as context class loader.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira