[ 
https://issues.apache.org/jira/browse/TIKA-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952638#comment-17952638
 ] 

Tim Allison commented on TIKA-4415:
-----------------------------------

Wait, did I not actually merge this between the two regression runs?

[https://github.com/apache/tika/pull/2199]

 

Let me look at the git log for 3.x and make sure that is in there. :/

 

> Improve zip detection on truncated zips
> ---------------------------------------
>
>                 Key: TIKA-4415
>                 URL: https://issues.apache.org/jira/browse/TIKA-4415
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Trivial
>             Fix For: 3.2.0
>
>
> On TIKA-4411, while running the regression tests, we found a file that used 
> to be identified as an xps file with 3.1.0 was now identified as a zip file 
> with the newer 3.x branch.
> The file was: BCENRNQMIUX64IPK3K5BBMM6JWU7XNKO
> The issue is subtle. The zip has a data descriptor. Our retry technique in 
> the detector calls reset() on the inputstream. For some reason this was 
> throwing an IOException (invalid mark) on a Tika inputstream. I couldn't 
> figure out why this was happening, but if we shift to spooling the zip to a 
> file and then retrying on that, everything works as it did.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to