[
https://issues.apache.org/jira/browse/HTTPCLIENT-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083712#comment-18083712
]
Arturo Bernal commented on HTTPCLIENT-2422:
-------------------------------------------
[~snehanie]
I pushed a fix that restores lazy construction of the decompression stream.
Could you please test the latest snapshot from master against your original use
case and confirm whether it resolves the problem?
The intended behavior is that closing or consuming an unread response entity
should no longer fail because of premature gzip stream initialization. Reading
malformed compressed content should still fail.
https://github.com/apache/httpcomponents-client/pull/836
> DecompressingEntity in 5.4+ eagerly creates decompression stream, causing
> ZipException on empty/invalid bodies (regression from 5.2 lazy behavior)
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HTTPCLIENT-2422
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2422
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient (classic)
> Affects Versions: 5.4, 5.5, 5.6
> Reporter: Sneha Murganoor
> Priority: Critical
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In 5.2, DecompressingEntity.getContent() returned a
> LazyDecompressingInputStream that deferred GZIPInputStream creation to the
> first read() call. This allowed responses with Content-Encoding: gzip but
> empty or non-gzip bodies to be handled gracefully — the stream was never read
> or the error surfaced at a point where callers could handle it.
> In 5.4+, DecompressingEntity (moved to
> org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call
> decoder.apply(super.getContent()) in getContent(). This immediately creates
> GZIPInputStream, which reads the gzip magic bytes in its constructor. If the
> body is empty (e.g., chunked transfer with zero-length body) or not actually
> compressed, this throws ZipException: Not in GZIP format at getContent() time
> — before the caller has any opportunity to handle it.
> Reproduction:
> A backend sends:
> {quote}
> HTTP/1.1 200 OK
> Content-Encoding: gzip
> Transfer-Encoding: chunked
> 0\r\n\r\n
> (Empty chunked body with Content-Encoding: gzip header.)
> {quote}
> In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream.
> Caller reads EOF without error.
> In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP
> format.
> Stack trace:
> {quote}
> java.util.zip.ZipException: Not in GZIP format
> at
> java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
> at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
> at
> org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
> {quote}
> Context:
> HTTPCLIENT-1432 reported the same class of issue (ZipException on 304
> responses with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1
> by using LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity
> removed lazy initialization, reintroducing this failure mode.
> While the backend is arguably misbehaving by sending Content-Encoding: gzip
> with no body, this is common in practice (web servers that add the header
> unconditionally regardless of whether compression occurred). The 5.2 behavior
> was more resilient to this.
> Suggested fix:
> Restore lazy stream initialization in DecompressingEntity.getContent() —
> defer decoder.apply() to first read(), or handle the case where the
> underlying stream is empty before attempting decompression.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]