[ 
https://issues.apache.org/jira/browse/TIKA-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415104#comment-13415104
 ] 

Marco Quaranta commented on TIKA-950:
-------------------------------------

The file is ok, I can list it with unzip.

unzip -l test_ooxml.docx

Archive:  test_ooxml.docx
  Length     Date   Time    Name
 --------    ----   ----    ----
     2015  01-01-80 00:00   [Content_Types].xml
      590  01-01-80 00:00   _rels/.rels
     1471  01-01-80 00:00   word/_rels/document.xml.rels
    39387  01-01-80 00:00   word/document.xml
      289  01-01-80 00:00   word/_rels/header1.xml.rels
      953  01-01-80 00:00   word/footnotes.xml
      947  01-01-80 00:00   word/endnotes.xml
     2860  01-01-80 00:00   word/header1.xml
     8488  01-01-80 00:00   word/footer1.xml
     6994  01-01-80 00:00   word/theme/theme1.xml
    94033  01-01-80 00:00   word/media/image1.png
      429  01-01-80 00:00   word/_rels/settings.xml.rels
     4619  01-01-80 00:00   word/settings.xml
    12178  01-01-80 00:00   word/styles.xml
    12694  01-01-80 00:00   word/webSettings.xml
     6363  01-01-80 00:00   word/numbering.xml
      787  01-01-80 00:00   docProps/core.xml
     2657  01-01-80 00:00   word/fontTable.xml
     1038  01-01-80 00:00   docProps/app.xml
 --------                   -------
   198792                   19 files


The really strange thing is that Tika is able to open it too (in 
ZipContainerDetector detect method): but it makes use of ZipFile from apache 
commons lib insted of java.util .. Maybe POI should use the commons package ..
                
> Wrong Office Open XML detection in ZipContainerDetector
> -------------------------------------------------------
>
>                 Key: TIKA-950
>                 URL: https://issues.apache.org/jira/browse/TIKA-950
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>            Reporter: Marco Quaranta
>            Priority: Minor
>              Labels: detection, ooxml
>             Fix For: 1.1
>
>         Attachments: ZipContainerDetector.diff
>
>
> Method detectOfficeOpenXML() in ZipContainerDetector class does not detect 
> correctly an ooxml file due to an Exception throwed by OPCPackage.open(..) 
> POI's class. This class make use of ZipFile when its method is called with 
> (as Tika do) filePath string and in this way it generates an exception; 
> passing instead a fileInputStream makes POI correctly detects OfficeOpenXML 
> formats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to