[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202251#comment-17202251 ]
ASF GitHub Bot commented on TIKA-3196: -------------------------------------- tballison commented on pull request #364: URL: https://github.com/apache/tika/pull/364#issuecomment-699004259 @PeterAlfredLee, thank you for this PR. The one small item that I had to fix was using an instance variable in PKGParser. Parsers have to be thread-safe. Otherwise, tho, this was an elegant solution. Thank you. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > PackageParser should attempt to parse entries from zip files with STORED > entries with data descriptor > ----------------------------------------------------------------------------------------------------- > > Key: TIKA-3196 > URL: https://issues.apache.org/jira/browse/TIKA-3196 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Trevor Bentley > Priority: Major > Attachments: OOO-107047-0.oxt-145.zip > > > We are currently using tika for text extraction. Currently some sites are > returning zips that have entries with stored data descriptors which fail to > extract due to the ZipArchiveInputStream (in commons-compress) defaulting to > false for 'allowStoredEntriesWithDataDescriptor'. > Since ZipArchiveInputStream has support for reading zips with data > descriptors we should attempt to read the zip with that feature enabled when > we get a data descriptor UnsupportedZipFeatureException. > Pull Request: > [https://github.com/apache/tika/pull/356|https://github.com/apache/tika/pull/355] -- This message was sent by Atlassian Jira (v8.3.4#803005)