[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Murganoor updated HTTPCLIENT-2422:
----------------------------------------
    Description: 
In 5.2, DecompressingEntity.getContent() returned a 
LazyDecompressingInputStream that deferred GZIPInputStream creation to the 
first read() call. This allowed responses with Content-Encoding: gzip but empty 
or non-gzip bodies to be handled gracefully — the stream was never read or the 
error surfaced at a point where callers could handle it.

In 5.4+, DecompressingEntity (moved to 
org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call 
decoder.apply(super.getContent()) in getContent(). This immediately creates 
GZIPInputStream, which reads the gzip magic bytes in its constructor. If the 
body is empty (e.g., chunked transfer with zero-length body) or not actually 
compressed, this throws ZipException: Not in GZIP format at getContent() time — 
before the caller has any opportunity to handle it.

Reproduction:

A backend sends:
{quote}
HTTP/1.1 200 OK
Content-Encoding: gzip
Transfer-Encoding: chunked
0\r\n\r\n
(Empty chunked body with Content-Encoding: gzip header.)
{quote}

In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream. 
Caller reads EOF without error.

In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP 
format.

Stack trace:

{quote}
java.util.zip.ZipException: Not in GZIP format
    at 
java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
    at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
    at 
org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
{quote}

Context:

HTTPCLIENT-1690 reported the same class of issue (ZipException on 304 responses 
with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1 by using 
LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity removed 
lazy initialization, reintroducing this failure mode.

While the backend is arguably misbehaving by sending Content-Encoding: gzip 
with no body, this is common in practice (web servers that add the header 
unconditionally regardless of whether compression occurred). The 5.2 behavior 
was more resilient to this.

Suggested fix:
Restore lazy stream initialization in DecompressingEntity.getContent() — defer 
decoder.apply() to first read(), or handle the case where the underlying stream 
is empty before attempting decompression.

  was:
In 5.2, DecompressingEntity.getContent() returned a 
LazyDecompressingInputStream that deferred GZIPInputStream creation to the 
first read() call. This allowed responses with Content-Encoding: gzip but empty 
or non-gzip bodies to be handled gracefully — the stream was never read or the 
error surfaced at a point where callers could handle it.

In 5.4+, DecompressingEntity (moved to 
org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call 
decoder.apply(super.getContent()) in getContent(). This immediately creates 
GZIPInputStream, which reads the gzip magic bytes in its constructor. If the 
body is empty (e.g., chunked transfer with zero-length body) or not actually 
compressed, this throws ZipException: Not in GZIP format at getContent() time — 
before the caller has any opportunity to handle it.

Reproduction:

A backend sends:
{quote}
HTTP/1.1 200 OK
Content-Encoding: gzip
Transfer-Encoding: chunked
0\r\n\r\n
(Empty chunked body with Content-Encoding: gzip header.)
{quote}

In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream. 
Caller reads EOF without error.

In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP 
format.

Stack trace:
java.util.zip.ZipException: Not in GZIP format
    at 
java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
    at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
    at 
org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
Context:

HTTPCLIENT-1690 reported the same class of issue (ZipException on 304 responses 
with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1 by using 
LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity removed 
lazy initialization, reintroducing this failure mode.

While the backend is arguably misbehaving by sending Content-Encoding: gzip 
with no body, this is common in practice (web servers that add the header 
unconditionally regardless of whether compression occurred). The 5.2 behavior 
was more resilient to this.

Suggested fix:
Restore lazy stream initialization in DecompressingEntity.getContent() — defer 
decoder.apply() to first read(), or handle the case where the underlying stream 
is empty before attempting decompression.


> DecompressingEntity in 5.4+ eagerly creates decompression stream, causing 
> ZipException on empty/invalid bodies (regression from 5.2 lazy behavior)
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-2422
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2422
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 5.4, 5.5, 5.6
>            Reporter: Sneha Murganoor
>            Priority: Critical
>
> In 5.2, DecompressingEntity.getContent() returned a 
> LazyDecompressingInputStream that deferred GZIPInputStream creation to the 
> first read() call. This allowed responses with Content-Encoding: gzip but 
> empty or non-gzip bodies to be handled gracefully — the stream was never read 
> or the error surfaced at a point where callers could handle it.
> In 5.4+, DecompressingEntity (moved to 
> org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call 
> decoder.apply(super.getContent()) in getContent(). This immediately creates 
> GZIPInputStream, which reads the gzip magic bytes in its constructor. If the 
> body is empty (e.g., chunked transfer with zero-length body) or not actually 
> compressed, this throws ZipException: Not in GZIP format at getContent() time 
> — before the caller has any opportunity to handle it.
> Reproduction:
> A backend sends:
> {quote}
> HTTP/1.1 200 OK
> Content-Encoding: gzip
> Transfer-Encoding: chunked
> 0\r\n\r\n
> (Empty chunked body with Content-Encoding: gzip header.)
> {quote}
> In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream. 
> Caller reads EOF without error.
> In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP 
> format.
> Stack trace:
> {quote}
> java.util.zip.ZipException: Not in GZIP format
>     at 
> java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
>     at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
>     at 
> org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
> {quote}
> Context:
> HTTPCLIENT-1690 reported the same class of issue (ZipException on 304 
> responses with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1 
> by using LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity 
> removed lazy initialization, reintroducing this failure mode.
> While the backend is arguably misbehaving by sending Content-Encoding: gzip 
> with no body, this is common in practice (web servers that add the header 
> unconditionally regardless of whether compression occurred). The 5.2 behavior 
> was more resilient to this.
> Suggested fix:
> Restore lazy stream initialization in DecompressingEntity.getContent() — 
> defer decoder.apply() to first read(), or handle the case where the 
> underlying stream is empty before attempting decompression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to