[ https://issues.apache.org/jira/browse/TIKA-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952638#comment-17952638 ]
Tim Allison commented on TIKA-4415: ----------------------------------- Wait, did I not actually merge this between the two regression runs? [https://github.com/apache/tika/pull/2199] Let me look at the git log for 3.x and make sure that is in there. :/ > Improve zip detection on truncated zips > --------------------------------------- > > Key: TIKA-4415 > URL: https://issues.apache.org/jira/browse/TIKA-4415 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Trivial > Fix For: 3.2.0 > > > On TIKA-4411, while running the regression tests, we found a file that used > to be identified as an xps file with 3.1.0 was now identified as a zip file > with the newer 3.x branch. > The file was: BCENRNQMIUX64IPK3K5BBMM6JWU7XNKO > The issue is subtle. The zip has a data descriptor. Our retry technique in > the detector calls reset() on the inputstream. For some reason this was > throwing an IOException (invalid mark) on a Tika inputstream. I couldn't > figure out why this was happening, but if we shift to spooling the zip to a > file and then retrying on that, everything works as it did. -- This message was sent by Atlassian Jira (v8.20.10#820010)