Sneha Murganoor created HTTPCLIENT-2422:
-------------------------------------------

             Summary: DecompressingEntity in 5.4+ eagerly creates decompression 
stream, causing ZipException on empty/invalid bodies (regression from 5.2 lazy 
behavior)
                 Key: HTTPCLIENT-2422
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2422
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient (classic)
    Affects Versions: 5.6, 5.5, 5.4
            Reporter: Sneha Murganoor


In 5.2, DecompressingEntity.getContent() returned a 
LazyDecompressingInputStream that deferred GZIPInputStream creation to the 
first read() call. This allowed responses with Content-Encoding: gzip but empty 
or non-gzip bodies to be handled gracefully — the stream was never read or the 
error surfaced at a point where callers could handle it.

In 5.4+, DecompressingEntity (moved to 
org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call 
decoder.apply(super.getContent()) in getContent(). This immediately creates 
GZIPInputStream, which reads the gzip magic bytes in its constructor. If the 
body is empty (e.g., chunked transfer with zero-length body) or not actually 
compressed, this throws ZipException: Not in GZIP format at getContent() time — 
before the caller has any opportunity to handle it.

Reproduction:

A backend sends:
HTTP/1.1 200 OK
Content-Encoding: gzip
Transfer-Encoding: chunked
0\r\n\r\n
(Empty chunked body with Content-Encoding: gzip header.)

In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream. 
Caller reads EOF without error.

In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP 
format.

Stack trace:
java.util.zip.ZipException: Not in GZIP format
    at 
java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
    at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
    at 
org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
Context:

HTTPCLIENT-1690 reported the same class of issue (ZipException on 304 responses 
with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1 by using 
LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity removed 
lazy initialization, reintroducing this failure mode.

While the backend is arguably misbehaving by sending Content-Encoding: gzip 
with no body, this is common in practice (web servers that add the header 
unconditionally regardless of whether compression occurred). The 5.2 behavior 
was more resilient to this.

Suggested fix:
Restore lazy stream initialization in DecompressingEntity.getContent() — defer 
decoder.apply() to first read(), or handle the case where the underlying stream 
is empty before attempting decompression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to