Creating zip files with many entries will ocassionally produce corrupted output
-------------------------------------------------------------------------------

                 Key: COMPRESS-94
                 URL: https://issues.apache.org/jira/browse/COMPRESS-94
             Project: Commons Compress
          Issue Type: Bug
    Affects Versions: 1.0
         Environment: Windows 2003 Server 64 bit, Java 6.0
            Reporter: Anon Devs


Our application produces large numbers of zip files, often with 1000's of 
similarly named files contained within the zip. 
When we switched from the standard JDK zip classes to those in commons 
compress, we would ocassionally produce a zip file that had corrupted index 
entries and would fail to unzip successfully using 7-zip, winzip, etc.

Debugging the zip creation showed that the the wrong offsets were being 
returned from the hashmap in ZipOutputStream for the entries that were being 
corrupted.  Further analysis revealed that this occurred when the filenames 
being added had a hash collision with another entry in the same output zip 
(which appears to happen quite frequently for us).

The issue appears to stem from the fact that ZipArchiveEntry can store the 
entry name either in its superclass if passed in on the ctor or in its own 
member attribute if set later via setName().  Not sure whether this 
functionality is really required?  Regardless, the root cause of the bug is 
that the equals() and hashCode() methods in ZipArchiveEntry do not always use 
the same filename value in their comparisons.  In fact if the filename of the 
entry is set in the ctor it will always treat two ZipArchiveEntries as equal.  
This will break the offset hashmap whenever there is a hash collision as it 
will overwrite the previous entry, believeing it to be equal.

Patch to follow.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to