DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=31930>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=31930 Zip & Unzip tasks major slowdown ------- Additional Comments From [EMAIL PROTECTED] 2004-11-08 14:29 ------- We are using the zip part (only the expanding/reading part - so I can only speak for that one), too. And also noticed how slow it is. You might also want to change your test script that it displays 2 times: - The time used to open create the Zipfile instance - The time used to iterate over the files and directories and expand them. If you do that, you will see that ant's zipfile implementation needs a long time to open a zip. Why will become clear soon. I am currently checking and changing the sourcecode (unfortunately for ant this is going to be 1.4 java) but I can line out some of the flaws here. - Zipfiles use intel byte order, also known as little Endian. The sources do the conversion correctly, but speak of "Big Endian" all the time. This is a cosmetic bug, but wrong doc leads to wrong derivated work. - ZipShort/ZipLong classes should have static helper methods to get primitive values. Instantiating an object just for the sake of getValue() doesn't help performance. - When a zip is opened, the central directory is read in entry for entry and each entry is parsed. This should be ok, it might be beneficial to read the whole directory into memory, but not really that much thanks to modern filesystems and caching (depends on directory size). The only point skipped are the extra information. I have no idea why. Wouldn't it be better to parse the extra data, or keep the raw data for later on-demand parsing? (No real flaws here ;) - After that, local header information for each entry is gathered. This is the starting offset for the compressed data and the extra information that has been skipped before. Now a) is this only necessary if decompression or extra data is requested b) does that cause a lot of stress for the filesystem as larg scale seeking throughout the zip file is necessary c) is the iteration order of the entries _NOT_ with increasing file offset but randomly because the method is iterating over the values collection of the hashtable. That really bogs down the performance (for uncached files). My ideas are: - Add static methods to the ZipDatatype classes. - Parse extra data when requested. - Read local header only if necessary (extra data, decompression) - If not then at least read the local headers in the right order. I implemented lazy header reading and this sped up opening a file over a "slow" network connection from 18 to 2 seconds. Ordered header reading brought it down to 12 seconds. The decompression to /dev/null isn't all that much slower than the java implementation, no matter if lazy local headers were used or not. I also didn't graps the idea behind the two ZipFile tables "entries" and "dataOffsets", I simply store those two offsets in the ZipEntry instance. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]