[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

Peter Lee (Jira) Fri, 29 May 2020 21:06:13 -0700


    [ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120104#comment-17120104
 ]


Peter Lee commented on COMPRESS-514:
------------------------------------

> _There are a few places where the 7z format says a certain value is a UINT64 
>and we store it inside of a Java long at best. Even if we fix this particular 
>case in some way there will be more problems lurking (that I hope we all catch 
>before they cause ArrayIndexOutOufBoundsExceptions or similar things). Because 
>of this I'd be fine with listing the known limitations._

I think you are talking about the _assertFitsIntoInt_ in SevenZFile(caused we 
are using arrays and java has a limitation of array length).

That's a comlicated problem and I will try to find a solution.

 

>  _As long as we detect a bad CRC inside of SevenZFile's constructor, your 
>option 3 sounds reasonable._

 

+1 for this. We are doing the similiar thing in _CRC32VerifyingInputStream_. I 
think [~akelday] is worried that the result of CRC check can only be known if 
all the data in _HeaderChannelBuffer_ is exhausted - and it means we have done 
a lot of work on the corrupted data. But it seems we do not have other options 
if we are handling a giant amout of data.

 

And for this particular issue, I'm not sure if we should merge the PR# 98 or 
not : for encoded header the PR is OK cause it's hard to image the header for 
the encoded header (header of header LOL :)) is bigger than 16MB, but it may 
cause some problems for normal header(not encoded) cause we can no longer 
obtain the CRC if its size is more than 16MB.

I'm not sure if this is a good idea or not : we could pass the expected CRC to 
HeaderChannelBuffer's constructor and throw exception when the data in 
HeaderChannelBuffer is exhausted - acting similiar to the 
_CRC32VerifyingInputStream._

> SevenZFile fails with encoded header over 2GiB
> ----------------------------------------------
>
>                 Key: COMPRESS-514
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-514
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.20
>            Reporter: A Kelday
>            Priority: Minor
>         Attachments: HeaderChannelBuffer.java
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize2416988886
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

Reply via email to