[
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344081#comment-17344081
]
Stefan Bodewig commented on COMPRESS-542:
-----------------------------------------
Commit 26924e96 contains an extended sanity check which gets away with lot less
memory allocations (basically a smallish constant number and a few bytes - up
to two longs plus a bit - times the claimed number of folders inside of the
archive currently). Right now it could check a few more things and I'll work on
that. I've enabled the check unconditionally but will revisit that once done as
it seems as if it would slow down normal operation - we'll have to see how
significant the effect is, and how much it will be reduced by removing the
checks that are now performed for a second time when actually filling the
metadata structures.
> Corrupt 7z allocates huge amount of SevenZEntries
> -------------------------------------------------
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
> Issue Type: Bug
> Affects Versions: 1.20
> Reporter: Robin Schimpf
> Priority: Major
> Attachments:
> Reduced_memory_allocation_for_corrupted_7z_archives.patch,
> endheadercorrupted.7z, endheadercorrupted2.7z
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly
> I'm unable to share the file. If you have enough Memory available the
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
> at
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
> at
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
> at
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:336)
> at
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:128)
> at
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation
> to about 1GB. So lazy instantiation of the entries could be a good solution
> to the problem. Optimal would be to only create the entries if the headers
> could be parsed correctly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)