ppkarwasz commented on PR #730: URL: https://github.com/apache/commons-compress/pull/730#issuecomment-3409795305
@garydgregory, What do you think about changing `SnappyCompressorInputStream.getUncompressedSize()` to **package-private**? It’s currently only used in unit tests. I could open a JIRA issue to track adding this functionality properly in `CompressorInputStream` later, where it could serve a clear purpose. For example, we could eventually provide a pair of methods like `getUncompressedSize()` / `setUncompressedSize()` and guarantee that, if `getUncompressedSize()` returns a non-negative value, the stream will **not** produce more than that number of bytes when decompressed. This would help users detect **corrupted streams** or **zip bombs**. In the latter case, the declared size might be deliberately understated to bypass preliminary validation checks put in place by Compress users. We could enforce the declared size to guard against that. The information could be made available for various formats as follows: * **Brotli (.br)**, **LZMA (.lzma)**, **Zstandard (.zst)**, and **LZ4 framed (.lz4)**: the uncompressed size is known at the start of the stream. * **Snappy**: only provides the uncompressed size per chunk, not for the full stream. * **BZip2**: exposes only an upper bound for the uncompressed size per block. * **Archive formats (ZIP, 7z)**: this information can come from the archive entry metadata. Given that, we should first define consistent `getUncompressedSize()` semantics that make sense across all formats before exposing it publicly for Snappy, hence my proposal of making it **package-private**. Do you agree? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
