[
https://issues.apache.org/jira/browse/TIKA-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532790#comment-15532790
]
Tim Allison commented on TIKA-2094:
-----------------------------------
That class isn't in poi-ooxml-schemas. However, if you use the full
ooxml-schemas, the file can be parsed.
{{poi-ooxml-schemas}} is a subset of {{ooxml-schemas}}. See the
[documentation|https://poi.apache.org/overview.html].
If you grant us permission to add the vsdx file to Apache POI's test suite
under the Apache License, we can add it to POI, and then the classes that are
currently missing will be added to {{poi-ooxml-schemas}} in the next version of
POI.
For now, try something like this:
{noformat}
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>${poi.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-schemas</artifactId>
<version>1.3</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-security</artifactId>
<version>1.1</version>
</dependency>
{noformat}
> Error parsing .doc file with visio embed
> ----------------------------------------
>
> Key: TIKA-2094
> URL: https://issues.apache.org/jira/browse/TIKA-2094
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.13
> Environment: JDK7
> Reporter: wangruochan
> Attachments: testtika.doc, testtika.doc
>
>
> when I try to parse a .doc file with a visio embeb,an exception occurred,
> Print the stacktrace below:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> com/microsoft/schemas/office/visio/x2012/main/ConnectsType
> at
> com.microsoft.schemas.office.visio.x2012.main.impl.PageContentsTypeImpl.getConnects(Unknown
> Source)
> at
> org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(XDGFBaseContents.java:89)
> at
> org.apache.poi.xdgf.usermodel.XDGFPageContents.onDocumentRead(XDGFPageContents.java:73)
> at
> org.apache.poi.xdgf.usermodel.XDGFPages.onDocumentRead(XDGFPages.java:94)
> at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:108)
> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
> at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.<init>(XmlVisioDocument.java:79)
> at
> org.apache.poi.xdgf.extractor.XDGFVisioExtractor.<init>(XDGFVisioExtractor.java:41)
> at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:212)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> at
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
> at
> org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:140)
> at
> org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:164)
> at
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:208)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at test.apache.tika.Test.main(Test.java:29)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> Caused by: java.lang.ClassNotFoundException:
> com.microsoft.schemas.office.visio.x2012.main.ConnectsType
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 30 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)