[
https://issues.apache.org/jira/browse/TIKA-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415104#comment-13415104
]
Marco Quaranta commented on TIKA-950:
-------------------------------------
The file is ok, I can list it with unzip.
unzip -l test_ooxml.docx
Archive: test_ooxml.docx
Length Date Time Name
-------- ---- ---- ----
2015 01-01-80 00:00 [Content_Types].xml
590 01-01-80 00:00 _rels/.rels
1471 01-01-80 00:00 word/_rels/document.xml.rels
39387 01-01-80 00:00 word/document.xml
289 01-01-80 00:00 word/_rels/header1.xml.rels
953 01-01-80 00:00 word/footnotes.xml
947 01-01-80 00:00 word/endnotes.xml
2860 01-01-80 00:00 word/header1.xml
8488 01-01-80 00:00 word/footer1.xml
6994 01-01-80 00:00 word/theme/theme1.xml
94033 01-01-80 00:00 word/media/image1.png
429 01-01-80 00:00 word/_rels/settings.xml.rels
4619 01-01-80 00:00 word/settings.xml
12178 01-01-80 00:00 word/styles.xml
12694 01-01-80 00:00 word/webSettings.xml
6363 01-01-80 00:00 word/numbering.xml
787 01-01-80 00:00 docProps/core.xml
2657 01-01-80 00:00 word/fontTable.xml
1038 01-01-80 00:00 docProps/app.xml
-------- -------
198792 19 files
The really strange thing is that Tika is able to open it too (in
ZipContainerDetector detect method): but it makes use of ZipFile from apache
commons lib insted of java.util .. Maybe POI should use the commons package ..
> Wrong Office Open XML detection in ZipContainerDetector
> -------------------------------------------------------
>
> Key: TIKA-950
> URL: https://issues.apache.org/jira/browse/TIKA-950
> Project: Tika
> Issue Type: Bug
> Components: mime
> Reporter: Marco Quaranta
> Priority: Minor
> Labels: detection, ooxml
> Fix For: 1.1
>
> Attachments: ZipContainerDetector.diff
>
>
> Method detectOfficeOpenXML() in ZipContainerDetector class does not detect
> correctly an ooxml file due to an Exception throwed by OPCPackage.open(..)
> POI's class. This class make use of ZipFile when its method is called with
> (as Tika do) filePath string and in this way it generates an exception;
> passing instead a fileInputStream makes POI correctly detects OfficeOpenXML
> formats.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira