[
https://issues.apache.org/jira/browse/COMPRESS-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085547#comment-18085547
]
Gary D. Gregory edited comment on COMPRESS-726 at 6/2/26 4:49 PM:
------------------------------------------------------------------
Why not simply document using Commons IO'Is BoundedInputStream? That's exactly
what it's for. I think it is better to compose solutions with what we have
instead of adding complexity.
See
https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BoundedInputStream.html
was (Author: garydgregory):
Why not simply document using Commons IO'Is BoundedInoutStream? That's exactly
what it's for. I think it is better to compose solutions with what we have
instead of adding complexity.
See
https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BoundedInputStream.html
> Build decompression-bomb protection into CompressorInputStream
> --------------------------------------------------------------
>
> Key: COMPRESS-726
> URL: https://issues.apache.org/jira/browse/COMPRESS-726
> Project: Commons Compress
> Issue Type: Improvement
> Reporter: Piotr Karwasz
> Priority: Major
>
> Every concrete {{CompressorInputStream}} already tracks both the compressed
> bytes it has consumed and the uncompressed bytes it has emitted, through
> {{InputStreamStatistics}}. The per-format differences in how the compressed
> count is sourced (raw-input counter vs. internal bit/byte reader) amount only
> to a bounded read-ahead skew (at most one decompressor buffer), which is
> irrelevant for decompression-bomb detection.
> We could therefore rely on {{InputStreamStatistics}} to introduce
> decompression-bomb protection into each {{CompressorInputStream}}. Concretely:
> # Make {{CompressorInputStream}} implement {{InputStreamStatistics}}, so the
> compressed/uncompressed counts are part of the base type for every format.
> # Let users configure decompression-bomb protection parameters on the streams
> the builders / {{CompressorStreamFactory}} return, enforced centrally in the
> base class.
> h2. Background: InputStreamStatistics is sufficient
> {{getUncompressedCount()}} is uniformly exact: it is the base
> {{CompressorInputStream.getBytesRead()}} counter, fed by {{count(...)}} on
> every {{read}}, so it equals the number of bytes actually handed to the
> caller.
> {{getCompressedCount()}} is a live counter in every implementation. Some
> source it from a {{BoundedInputStream}} wrapping the raw input (so it
> includes a fixed amount of decompressor read-ahead), others from the format's
> own bit/byte reader (read-ahead corrected). Because the read-ahead is bounded
> by a constant buffer size, the ratio {{uncompressedCount / compressedCount}}
> converges to the true ratio as the stream grows, and the absolute output
> count is always exact. Both protection strategies below are robust to this
> skew once a small grace period has elapsed.
> h2. Proposed change 1: CompressorInputStream implements InputStreamStatistics
> The base class already defines {{getUncompressedCount()}}. Add
> {{getCompressedCount()}} and declare {{implements InputStreamStatistics}}.
> All concrete subclasses except {{Pack200CompressorInputStream}} already
> override {{getCompressedCount()}}, so the base declaration should provide a
> default (return {{-1}} meaning "unknown") rather than be {{abstract}}, to
> preserve source and binary compatibility for external subclasses. Subclasses
> that have a real count keep overriding it.
> h2. Proposed change 2: configurable bomb protection on the returned streams
> Expose two independent, composable limits, settable through the existing
> builders and through {{CompressorStreamFactory}} (which already carries
> {{memoryLimitInKb}}, so this is a natural companion):
> * *Absolute output cap*: throw once {{getUncompressedCount()}} exceeds a
> configured maximum. Exact and format independent.
> * *Mean ratio guard*: once output passes a configurable grace threshold,
> throw if {{getUncompressedCount() / getCompressedCount()}} exceeds a
> configured maximum ratio. The grace period tolerates legitimately high local
> ratios at the start (headers, dictionaries, long runs) while still enforcing
> a sane mean ratio.
> h2. Gap: Pack200CompressorInputStream
> {{Pack200CompressorInputStream}} is the one exception. It overrides
> {{read(...)}} to delegate straight to an internal bridge stream, never calls
> {{count(...)}} (its Javadoc states {{getBytesRead()}} always returns 0), and
> does not implement {{InputStreamStatistics}}. To bring it under the same
> protection it must route output through {{count(...)}} and provide a real
> {{getCompressedCount()}} (counting bytes pulled from the original input).
> Otherwise it should be documented as unprotected, with
> {{getCompressedCount()}} returning {{-1}} and the ratio guard skipped (the
> absolute cap also cannot fire while output is uncounted).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)