[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630569#comment-16630569
 ] 

Stefan Bodewig commented on COMPRESS-466:
-----------------------------------------

Commons Compress parses the extra fields of local file headers in addition to 
the extra fields of the central data section - which the java.util version does 
not.

The less technical description is that java.util.ZipFile may be missing 
important data for the entries that the Commons Compress version provides. In 
many if not most cases there will be no difference, though.

Right now there is no way around it, but it would certainly be possible to add 
a flag to ZipFile's constructor that says "I know that parsing the central data 
section is enough" and skip this step.

There is at least one thing I'm aware of that won't work if we skip reading the 
local file header: reading entry names or comments from unicode extra fields. 
See http://commons.apache.org/proper/commons-compress/zip.html#Encoding

The resolveLocalFileHeaderData method does a few additional things that would 
need to be handled in a different way if it was skipped (making sure we know 
all entries that share the same name and ensuring we find the proper start of 
the data stream).

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> ------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-466
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-466
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.18
>         Environment: Tested both on Linux and OSX 10.13.6.
>            Reporter: Jakob Sultan Ericsson
>            Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
>         try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
>             System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
>         }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
>     private void resolveLocalFileHeaderData(final Map<ZipArchiveEntry, 
> NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to