Sebastian Iturra created TIKA-2055:
--------------------------------------
Summary: Exception on parsing .docx file
Key: TIKA-2055
URL: https://issues.apache.org/jira/browse/TIKA-2055
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.13
Environment: Linux Centos 7
Reporter: Sebastian Iturra
Priority: Critical
Command: java -jar tika-app-1.13.jar input.docx
Exception in thread "main" org.apache.tika.exception.TikaException: Error
creating OOXML extractor
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:120)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Caused by: org.apache.xmlbeans.impl.values.XmlValueOutOfRangeException: Invalid
int value: 4294967295
at
org.apache.xmlbeans.impl.values.JavaIntHolder.set_text(JavaIntHolder.java:43)
at
org.apache.xmlbeans.impl.values.XmlObjectBase.update_from_wscanon_text(XmlObjectBase.java:1180)
at
org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1319)
at
org.apache.xmlbeans.impl.values.JavaIntHolder.getIntValue(JavaIntHolder.java:53)
at
org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.impl.CTPropertiesImpl.getTotalTime(Unknown
Source)
at
org.apache.tika.parser.microsoft.ooxml.MetadataExtractor.extractMetadata(MetadataExtractor.java:124)
at
org.apache.tika.parser.microsoft.ooxml.MetadataExtractor.extract(MetadataExtractor.java:62)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:109)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)