[ https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344081#comment-17344081 ]
Stefan Bodewig commented on COMPRESS-542: ----------------------------------------- Commit 26924e96 contains an extended sanity check which gets away with lot less memory allocations (basically a smallish constant number and a few bytes - up to two longs plus a bit - times the claimed number of folders inside of the archive currently). Right now it could check a few more things and I'll work on that. I've enabled the check unconditionally but will revisit that once done as it seems as if it would slow down normal operation - we'll have to see how significant the effect is, and how much it will be reduced by removing the checks that are now performed for a second time when actually filling the metadata structures. > Corrupt 7z allocates huge amount of SevenZEntries > ------------------------------------------------- > > Key: COMPRESS-542 > URL: https://issues.apache.org/jira/browse/COMPRESS-542 > Project: Commons Compress > Issue Type: Bug > Affects Versions: 1.20 > Reporter: Robin Schimpf > Priority: Major > Attachments: > Reduced_memory_allocation_for_corrupted_7z_archives.patch, > endheadercorrupted.7z, endheadercorrupted2.7z > > Time Spent: 3h 10m > Remaining Estimate: 0h > > We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about > 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly > I'm unable to share the file. If you have enough Memory available the > following exception is thrown. > {code:java} > java.io.IOException: Start header corrupt and unable to guess end Header > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:336) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:128) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:369) > {code} > 7z itself aborts really quick when I'm trying to list the content of the file. > {code:java} > 7z l "corrupt.7z" > 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28 > Scanning the drive for archives: > 1 file, 1537752212 bytes (1467 MiB) > Listing archive: corrupt.7z > ERROR: corrupt.7z : corrupt.7z > Open ERROR: Can not open the file as [7z] archive > ERRORS: > Is not archive > Errors: 1 > {code} > I hacked together the attached patch which will reduce the memory allocation > to about 1GB. So lazy instantiation of the entries could be a good solution > to the problem. Optimal would be to only create the entries if the headers > could be parsed correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)