Seva Alekseyev created TIKA-2203:
------------------------------------
Summary: InvalidOperationException on a valid Word file
Key: TIKA-2203
URL: https://issues.apache.org/jira/browse/TIKA-2203
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.14
Environment: Windows 7 x64, JVM 1.8.0_101
Reporter: Seva Alekseyev
Attachments: OPCCompliance_DerivedPartNameFAIL.docx
The attached Word file, which opens in Word, errors out in Tika:
org.apache.tika.exception.TikaException: Error creating OOXML extractor
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:123
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87
at gov.nih.niaid.fscanner.Extract.ExtractContents:69
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: You
can't add a part with a part name derived from another part ! [M1.11]
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl:338
at org.apache.poi.openxml4j.opc.OPCPackage.getParts:774
at org.apache.poi.openxml4j.opc.OPCPackage.open:268
at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:69
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87
Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: You
can't add a part with a part name derived from another part ! [M1.11]
at org.apache.poi.openxml4j.opc.PackagePartCollection.put:66
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl:336
at org.apache.poi.openxml4j.opc.OPCPackage.getParts:774
at org.apache.poi.openxml4j.opc.OPCPackage.open:268
at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:69
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)