Yihua Huang created HTTPCLIENT-1432:
---------------------------------------
Summary: Lazy decompressing of HttpEntity.getContent()
Key: HTTPCLIENT-1432
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1432
Project: HttpComponents HttpClient
Issue Type: Improvement
Components: HttpClient
Affects Versions: 4.3.1, 4.3.2
Reporter: Yihua Huang
Priority: Minor
In 4.3, DecompressingEntity is used for decompressing entity of http response.
When we call DecompressingEntity.getContent(), an new DeflateInputStream or
GZIPInputStream will be created, and the header of compressing part will be
read and checked.
InputStream decorate(final InputStream wrapped) throws IOException {
return new GZIPInputStream(wrapped);
}
In some cases, we don't really need to decompress it. For example, in
"http://baike.baidu.com/search/word?word=httpclient&pic=1&sug=1&enc=utf8" the
response state code is 302, it contains header "Content-Encoding:gzip" but
without any entity data (It occurs sometimes). In RedirectExec.execute(), we
don't read the entity, but in the end, it try to close inputstream by
EntityUtils.consume(response.getEntity()). When we call entity.getContent() in
EntityUtils.consume(response.getEntity()), an EOFException will be thrown and
the redirect can not continue.
In this case, we don't care about the real entity -- even if the compress
format is not right.
In my opinion, the format should be created and checked ONLY when we need to
read the content but not just when closing it. So I wrote
LazyDecompressingInputStream as a wrapper and create the DecompressingStream
until read() method is called. Then more website will be supported.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]