[
https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194396#comment-17194396
]
Trevor Bentley commented on COMPRESS-555:
-----------------------------------------
I agree [~ggregory].
[~bodewig]- Appreciate the additional info on the STORED entries. In digging
deeper in Tika, yhis seems like something that could be handled on the Tika
end. When the UnsupportedZipException is thrown because of the data descriptor
we could try to read the zip using a ZipArchiveInputStream with the
allowStoredEntriesWithDataDescriptor enabled.
Created a new ticket for this - https://issues.apache.org/jira/browse/TIKA-3196
Will close this issue since this is the wrong route to take to solve the issue.
> ZipArchiveInputStream should allow stored entries with data descriptor by
> default
> ---------------------------------------------------------------------------------
>
> Key: COMPRESS-555
> URL: https://issues.apache.org/jira/browse/COMPRESS-555
> Project: Commons Compress
> Issue Type: Improvement
> Components: Archivers
> Affects Versions: 1.20
> Reporter: Trevor Bentley
> Priority: Major
> Fix For: 1.21
>
>
> We are currently using tika for text extraction which uses commons-compress
> for handling zips. Currently some sites are returning zips that have entries
> with stored data descriptors which fail to extract due to the
> ZipArchiveInputStream defaulting to false for
> 'allowStoredEntriesWithDataDescriptor'.
> Allowing the reading of stored entries on Zip archives should be enabled by
> default.
> PR: https://github.com/apache/commons-compress/pull/137
--
This message was sent by Atlassian Jira
(v8.3.4#803005)