ppkarwasz commented on PR #730:
URL: https://github.com/apache/commons-compress/pull/730#issuecomment-3409795305

   @garydgregory,
   
   What do you think about changing 
`SnappyCompressorInputStream.getUncompressedSize()` to **package-private**? 
It’s currently only used in unit tests.
   
   I could open a JIRA issue to track adding this functionality properly in 
`CompressorInputStream` later, where it could serve a clear purpose. For 
example, we could eventually provide a pair of methods like 
`getUncompressedSize()` / `setUncompressedSize()` and guarantee that, if 
`getUncompressedSize()` returns a non-negative value, the stream will **not** 
produce more than that number of bytes when decompressed.
   
   This would help users detect **corrupted streams** or **zip bombs**. In the 
latter case, the declared size might be deliberately understated to bypass 
preliminary validation checks put in place by Compress users. We could enforce 
the declared size to guard against that.
   The information could be made available for various formats as follows:
   
   * **Brotli (.br)**, **LZMA (.lzma)**, **Zstandard (.zst)**, and **LZ4 framed 
(.lz4)**: the uncompressed size is known at the start of the stream.
   * **Snappy**: only provides the uncompressed size per chunk, not for the 
full stream.
   * **BZip2**: exposes only an upper bound for the uncompressed size per block.
   * **Archive formats (ZIP, 7z)**: this information can come from the archive 
entry metadata.
   
   Given that, we should first define consistent `getUncompressedSize()` 
semantics that make sense across all formats before exposing it publicly for 
Snappy, hence my proposal of making it **package-private**. Do you agree?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to