Simon Gaeremynck created TIKA-3556:
--------------------------------------

             Summary: DefaultZipContainerDetector returns application/zip for 
.odt files when OPCPackageDetector is on the classpath
                 Key: TIKA-3556
                 URL: https://issues.apache.org/jira/browse/TIKA-3556
             Project: Tika
          Issue Type: Bug
          Components: detector
    Affects Versions: 2.1.0
            Reporter: Simon Gaeremynck


This is happening because the OPCPackageDetector.detect method will [fail and 
close the underlying zip 
stream|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java#L257].
 When the next detector runs (e.g. OpenDocumentDetector), the stream it 
receives has been closed and it won't be able to detect anything.

After all detectors have effectively no-oped, [the DefaultZipContainerDetector 
falls back to 
application/zip|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/DefaultZipContainerDetector.java#L209].

Now, when running with the default CompositeDetector, the next detector is 
usually the MimeTypes detector. This returns the proper 
application/vnd.oasis.opendocument.text, but the [CompositeDetector will 
ignore|https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java#L86]
 it as that mime type isn't marked up as a subclass of application/zip in [the 
registry|https://github.com/apache/tika/blob/2.1.0-rc2/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L2327].

 

In short, I think there are two bugs here potentially:
 # The OPCPacakageDetector either shouldn't close the zip while detecting or 
the DefaultZipContainerDetector should re-open if necessary?
 # The registry should be updated to mark up 
application/vnd.oasis.opendocument.text as a subclass of application/zip ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to