[ https://issues.apache.org/jira/browse/HTTPCLIENT-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412835#comment-17412835 ]
Tim Allison commented on HTTPCLIENT-2176: ----------------------------------------- There doesn't appear to be any content-length set in the headers? {noformat} Date: Thu, 09 Sep 2021 20:44:22 GMT Server: Apache Content-Disposition: attachment; filename=PATRIMONIOS_CULTURAIS_E_PANDEMIA.pdf Content-Transfer-Encoding: binary Vary: Accept-Encoding Upgrade: h2 Connection: Upgrade, Keep-Alive Keep-Alive: timeout=5, max=500 Content-Type: application/save {noformat} When I print out some features of the entity {noformat} System.out.println("LENGTH: " + r.getEntity().getContentLength()); System.out.println("chunked: " + r.getEntity().isChunked()); System.out.println("repeatable: " + r.getEntity().isRepeatable()); System.out.println("streaming: " + r.getEntity().isStreaming()); {noformat} I see: {noformat} LENGTH: -1 chunked: false repeatable: false streaming: true {noformat} > Premature end of Content-Length delimited message body but works with wget > -------------------------------------------------------------------------- > > Key: HTTPCLIENT-2176 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2176 > Project: HttpComponents HttpClient > Issue Type: Task > Components: HttpClient (classic) > Affects Versions: 4.5.13 > Environment: httpclient: 4.5.13 > httpcore: 4.4.14 > java 11 (archaic): openjdk version "11.0.4" 2019-07-16 > Reporter: Tim Allison > Priority: Minor > > I'm doing a recrawl of truncated files from CommonCrawl in support of work on > Apache Tika, and I've found a few files where I'm able to download the files > successfully with wget but with httpclient, I'm getting: > {noformat} > org.apache.http.ConnectionClosedException: Premature end of Content-Length > delimited message body (expected: 216,481; received: 203,820) > at > org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) > at > org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:198) > at > org.apache.http.impl.io.ContentLengthInputStream.close(ContentLengthInputStream.java:101) > at > org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:142) > at > org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228) > at > org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172) > at > java.base/java.util.zip.InflaterInputStream.close(InflaterInputStream.java:232) > at > java.base/java.util.zip.GZIPInputStream.close(GZIPInputStream.java:137) > at > org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:94) > at FetcherTest.testBasic(FetcherTest.java:40) > > {noformat} > The triggering file is: https://direitosculturais.com.br/pdf.php?id=151 > Example all defaults: > {noformat} > String url = "https://direitosculturais.com.br/pdf.php?id=151"; > HttpClient client = HttpClientBuilder.create().build(); > HttpGet get = new HttpGet(url); > HttpResponse r = client.execute(get); > Path output = Paths.get("/data/tmp.pdf"); > try (InputStream is = r.getEntity().getContent()) { > Files.copy(is, output, StandardCopyOption.REPLACE_EXISTING); > } > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org