[
https://issues.apache.org/jira/browse/COMPRESS-623?focusedWorklogId=800425&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-800425
]
ASF GitHub Bot logged work on COMPRESS-623:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 13/Aug/22 12:01
Start Date: 13/Aug/22 12:01
Worklog Time Spent: 10m
Work Description: garydgregory commented on code in PR #306:
URL: https://github.com/apache/commons-compress/pull/306#discussion_r945131683
##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -613,7 +621,7 @@ public InputStream getRawInputStream(final ZipArchiveEntry
ze) {
* @throws IOException on error
*/
public void copyRawEntries(final ZipArchiveOutputStream target, final
ZipArchiveEntryPredicate predicate)
- throws IOException {
+ throws IOException {
Review Comment:
Just get your IDE to behave ;-) The more noise, the longer it takes to
review and it's annoying as well.
Issue Time Tracking
-------------------
Worklog Id: (was: 800425)
Time Spent: 1.5h (was: 1h 20m)
> make ZipFile's getRawInputStream usable when local headers are not read
> -----------------------------------------------------------------------
>
> Key: COMPRESS-623
> URL: https://issues.apache.org/jira/browse/COMPRESS-623
> Project: Commons Compress
> Issue Type: Improvement
> Reporter: Dawid Weiss
> Priority: Minor
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> I have a somewhat odd use case with gigabytes of ZIP files, each with
> thousands of documents (on comparatively slow, network drives). We need to
> restructure these ZIPs without the need to recompress files.
> The above turns out to work almost perfectly with raw-data copying ZipFile
> offers but empirical tests showed a major slowdown in the initial opening of
> zip files, linked to multiple reads/seeks for local file headers. If an
> option is passed to ignore those headers, raw streams are inaccessible.
> I've taken a look at the code and the code in getRawInputStream could
> basically do the same thing that getInputStream does - lazily load the
> missing offset via getDataOffset(ZipEntry). In fact, getInputStream could
> just call getRawInputStream directly, which avoids some code duplication.
> I see speedups for opening and copying random raw streams in the order of
> 3-4x and all the current tests pass. I filed a PR at github - happy to
> discuss it there.
> [https://github.com/apache/commons-compress/pull/306]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)