[ 
https://issues.apache.org/jira/browse/TIKA-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532790#comment-15532790
 ] 

Tim Allison commented on TIKA-2094:
-----------------------------------

That class isn't in poi-ooxml-schemas.  However, if you use the full 
ooxml-schemas, the file can be parsed.

{{poi-ooxml-schemas}} is a subset of {{ooxml-schemas}}.  See the 
[documentation|https://poi.apache.org/overview.html].  

If you grant us permission to add the vsdx file to Apache POI's test suite 
under the Apache License, we can add it to POI, and then the classes that are 
currently missing will be added to {{poi-ooxml-schemas}} in the next version of 
POI.

For now, try something like this:

{noformat}
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml</artifactId>
      <version>${poi.version}</version>
      <exclusions>
        <exclusion>
          <groupId>org.apache.poi</groupId>
          <artifactId>poi-ooxml-schemas</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>ooxml-schemas</artifactId>
      <version>1.3</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>ooxml-security</artifactId>
      <version>1.1</version>
    </dependency>
{noformat}

> Error parsing .doc file with visio embed
> ----------------------------------------
>
>                 Key: TIKA-2094
>                 URL: https://issues.apache.org/jira/browse/TIKA-2094
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>         Environment: JDK7
>            Reporter: wangruochan
>         Attachments: testtika.doc, testtika.doc
>
>
> when I try to parse a  .doc file with a visio embeb,an exception occurred, 
> Print  the stacktrace  below:
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> com/microsoft/schemas/office/visio/x2012/main/ConnectsType
>       at 
> com.microsoft.schemas.office.visio.x2012.main.impl.PageContentsTypeImpl.getConnects(Unknown
>  Source)
>       at 
> org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(XDGFBaseContents.java:89)
>       at 
> org.apache.poi.xdgf.usermodel.XDGFPageContents.onDocumentRead(XDGFPageContents.java:73)
>       at 
> org.apache.poi.xdgf.usermodel.XDGFPages.onDocumentRead(XDGFPages.java:94)
>       at 
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:108)
>       at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
>       at 
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.<init>(XmlVisioDocument.java:79)
>       at 
> org.apache.poi.xdgf.extractor.XDGFVisioExtractor.<init>(XDGFVisioExtractor.java:41)
>       at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:212)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
>       at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
>       at 
> org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:140)
>       at 
> org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:164)
>       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:208)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at test.apache.tika.Test.main(Test.java:29)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> Caused by: java.lang.ClassNotFoundException: 
> com.microsoft.schemas.office.visio.x2012.main.ConnectsType
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>       ... 30 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to