[ 
https://issues.apache.org/jira/browse/TIKA-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424210#comment-17424210
 ] 

Hudson commented on TIKA-3556:
------------------------------

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #331 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/331/])
TIKA-3556 Fix Open Office mime types to be subclasses of application/zip 
(tallison: 
[https://github.com/apache/tika/commit/7b792811502bde32774c7562a60273455e5be575])
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/test/resources/org/apache/tika/parser/odf/tika-config-detectors.xml
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/DefaultZipContainerDetector.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/ZipContainerDetector.java
* (edit) tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/detect/TestContainerAwareDetector.java
* (edit) CHANGES.txt
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java
TIKA-3556 add new subtype of zip to PackageParser and fix unit test (tallison: 
[https://github.com/apache/tika/commit/c03d6bebc3c6a7a1554021c5da8ce9014b2fda7c])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/detect/TestContainerAwareDetector.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/PackageParser.java


> DefaultZipContainerDetector returns application/zip for .odt files when 
> OPCPackageDetector is on the classpath
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3556
>                 URL: https://issues.apache.org/jira/browse/TIKA-3556
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 2.1.0
>            Reporter: Simon Gaeremynck
>            Priority: Major
>
> This is happening because the OPCPackageDetector.detect method will [fail and 
> close the underlying zip 
> stream|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java#L257].
>  When the next detector runs (e.g. OpenDocumentDetector), the stream it 
> receives has been closed and it won't be able to detect anything.
> After all detectors have effectively no-oped, [the 
> DefaultZipContainerDetector falls back to 
> application/zip|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/DefaultZipContainerDetector.java#L209].
> Now, when running with the default CompositeDetector, the next detector is 
> usually the MimeTypes detector. This returns the proper 
> application/vnd.oasis.opendocument.text, but the [CompositeDetector will 
> ignore|https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java#L86]
>  it as that mime type isn't marked up as a subclass of application/zip in 
> [the 
> registry|https://github.com/apache/tika/blob/2.1.0-rc2/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L2327].
>  
> In short, I think there are two bugs here potentially:
>  # The OPCPacakageDetector either shouldn't close the zip while detecting or 
> the DefaultZipContainerDetector should re-open if necessary?
>  # The registry should be updated to mark up 
> application/vnd.oasis.opendocument.text as a subclass of application/zip ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to