[ 
https://issues.apache.org/jira/browse/TIKA-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507296#comment-17507296
 ] 

Hudson commented on TIKA-3701:
------------------------------

UNSTABLE: Integrated in Jenkins build Tika ยป tika-main-jdk8 #490 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/490/])
TIKA-3701 -- DefaultZipContainerDetector should backoff to try to detect a 
stream if there's a failure to open a ZipFile. (tallison: 
[https://github.com/apache/tika/commit/1b5b3253d6f31f10f2e35424e372a136a1e2f1ea])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java
* (edit) tika-core/src/test/java/org/apache/tika/TikaTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/DefaultZipContainerDetector.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/ooxml/TruncatedOOXMLTest.java


> ZipDetector on a file should back off to streaming detection on failure to 
> open a zipfile
> -----------------------------------------------------------------------------------------
>
>                 Key: TIKA-3701
>                 URL: https://issues.apache.org/jira/browse/TIKA-3701
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>             Fix For: 2.3.1
>
>         Attachments: Carved-107429888
>
>
> If a file is passed to Tika wrapped as a TikaInputStream with an underlying 
> file, the DefaultZipDetector tries to open a ZipFile.  If there's a truncated 
> file or if that ZipFile open fails, the DefaultZipDetector effectively gives 
> up.
> Given that there's still a file available, we should try to do a streaming 
> detect by reopening the file as a regular InputStream.
> If we don't do this, we wind up getting different detection for some 
> truncated ooxml if the user sends in a file vs a stream.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to