[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412835#comment-17412835
 ] 

Tim Allison commented on HTTPCLIENT-2176:
-----------------------------------------

There doesn't appear to be any content-length set in the headers?
{noformat}
Date: Thu, 09 Sep 2021 20:44:22 GMT
Server: Apache
Content-Disposition: attachment; filename=PATRIMONIOS_CULTURAIS_E_PANDEMIA.pdf
Content-Transfer-Encoding: binary
Vary: Accept-Encoding
Upgrade: h2
Connection: Upgrade, Keep-Alive
Keep-Alive: timeout=5, max=500
Content-Type: application/save
{noformat}

When I print out some features of the entity

{noformat}
        System.out.println("LENGTH: " + r.getEntity().getContentLength());
        System.out.println("chunked: " + r.getEntity().isChunked());
        System.out.println("repeatable: " + r.getEntity().isRepeatable());
        System.out.println("streaming: " + r.getEntity().isStreaming());
{noformat}

I see:

{noformat}
LENGTH: -1
chunked: false
repeatable: false
streaming: true
{noformat}

> Premature end of Content-Length delimited message body but works with wget
> --------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-2176
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2176
>             Project: HttpComponents HttpClient
>          Issue Type: Task
>          Components: HttpClient (classic)
>    Affects Versions: 4.5.13
>         Environment: httpclient: 4.5.13
> httpcore: 4.4.14
> java 11 (archaic): openjdk version "11.0.4" 2019-07-16
>            Reporter: Tim Allison
>            Priority: Minor
>
> I'm doing a recrawl of truncated files from CommonCrawl in support of work on 
> Apache Tika, and I've found a few files where I'm able to download the files 
> successfully with wget but with httpclient, I'm getting:
> {noformat}
> org.apache.http.ConnectionClosedException: Premature end of Content-Length 
> delimited message body (expected: 216,481; received: 203,820)
>       at 
> org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
>       at 
> org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:198)
>       at 
> org.apache.http.impl.io.ContentLengthInputStream.close(ContentLengthInputStream.java:101)
>       at 
> org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:142)
>       at 
> org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
>       at 
> org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172)
>       at 
> java.base/java.util.zip.InflaterInputStream.close(InflaterInputStream.java:232)
>       at 
> java.base/java.util.zip.GZIPInputStream.close(GZIPInputStream.java:137)
>       at 
> org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:94)
>       at FetcherTest.testBasic(FetcherTest.java:40)
>       
> {noformat}
> The triggering file is: https://direitosculturais.com.br/pdf.php?id=151
> Example all defaults:
> {noformat}
>         String url = "https://direitosculturais.com.br/pdf.php?id=151";;
>         HttpClient client = HttpClientBuilder.create().build();
>         HttpGet get = new HttpGet(url);
>         HttpResponse r = client.execute(get);
>         Path output = Paths.get("/data/tmp.pdf");
>         try (InputStream is = r.getEntity().getContent()) {
>             Files.copy(is, output, StandardCopyOption.REPLACE_EXISTING);
>         }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to