[ 
https://issues.apache.org/jira/browse/TIKA-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414957#comment-13414957
 ] 

Marco Quaranta commented on TIKA-950:
-------------------------------------

Sadly I cannot write a unit test using the problematic file: it's italian 
public administration-owned and it's confidential.
The error, if it could be interesting, is cause by ZipFile class. 

This is the stacktrace:

java.util.zip.ZipException: error in opening zip file
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:127)
        at java.util.zip.ZipFile.<init>(ZipFile.java:144)
        at 
org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:157)
        at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:101)
        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207)
        at 
org.apache.tika.parser.pkg.ZipContainerDetector.detectOfficeOpenXML(ZipContainerDetector.java:132)
        at 
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:75)
        at 
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
        at org.apache.tika.Tika.detect(Tika.java:133)
        at TestFileTika.main(TestFileTika.java:62)

This error, as I said, doesn't happen using ZipInputStream.

I hope, however, I will find a similar file for testingaa
                
> Wrong Office Open XML detection in ZipContainerDetector
> -------------------------------------------------------
>
>                 Key: TIKA-950
>                 URL: https://issues.apache.org/jira/browse/TIKA-950
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>            Reporter: Marco Quaranta
>            Priority: Minor
>              Labels: detection, ooxml
>             Fix For: 1.1
>
>         Attachments: ZipContainerDetector.diff
>
>
> Method detectOfficeOpenXML() in ZipContainerDetector class does not detect 
> correctly an ooxml file due to an Exception throwed by OPCPackage.open(..) 
> POI's class. This class make use of ZipFile when its method is called with 
> (as Tika do) filePath string and in this way it generates an exception; 
> passing instead a fileInputStream makes POI correctly detects OfficeOpenXML 
> formats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to