[ 
https://issues.apache.org/jira/browse/TIKA-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415533#comment-17415533
 ] 

Tim Allison commented on TIKA-3556:
-----------------------------------

And then I see a TODO: {{//TODO: OPCBased needs to be last!!!}}...  Ugh.

> DefaultZipContainerDetector returns application/zip for .odt files when 
> OPCPackageDetector is on the classpath
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3556
>                 URL: https://issues.apache.org/jira/browse/TIKA-3556
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 2.1.0
>            Reporter: Simon Gaeremynck
>            Priority: Major
>
> This is happening because the OPCPackageDetector.detect method will [fail and 
> close the underlying zip 
> stream|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java#L257].
>  When the next detector runs (e.g. OpenDocumentDetector), the stream it 
> receives has been closed and it won't be able to detect anything.
> After all detectors have effectively no-oped, [the 
> DefaultZipContainerDetector falls back to 
> application/zip|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/DefaultZipContainerDetector.java#L209].
> Now, when running with the default CompositeDetector, the next detector is 
> usually the MimeTypes detector. This returns the proper 
> application/vnd.oasis.opendocument.text, but the [CompositeDetector will 
> ignore|https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java#L86]
>  it as that mime type isn't marked up as a subclass of application/zip in 
> [the 
> registry|https://github.com/apache/tika/blob/2.1.0-rc2/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L2327].
>  
> In short, I think there are two bugs here potentially:
>  # The OPCPacakageDetector either shouldn't close the zip while detecting or 
> the DefaultZipContainerDetector should re-open if necessary?
>  # The registry should be updated to mark up 
> application/vnd.oasis.opendocument.text as a subclass of application/zip ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to