[
https://issues.apache.org/jira/browse/HTTPCLIENT-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903575#comment-13903575
]
Chris Heald commented on HTTPCLIENT-1461:
-----------------------------------------
Corroborating this. Manually patching the given patch in substantially improved
our retrieval time for gzip-compressed resources.
> GZIP decoding is very slow
> --------------------------
>
> Key: HTTPCLIENT-1461
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1461
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Affects Versions: 4.3.2
> Reporter: Sebastiano Vigna
> Priority: Critical
> Labels: regression
>
> In 4.3.1, LazyDecompressingInputStream was introduced. However,
> LazyDecompressingInputStream subclasses InputStream without overriding the
> multi-byte read() method, and the inherited method does a byte-by-byte read.
> This is trace showing what happens:
> java.util.zip.Inflater.inflateBytes(Inflater.java:Unknown line)
> java.util.zip.Inflater.inflate(Inflater.java:259)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
>
> org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:56)
> java.io.InputStream.read(InputStream.java:179)
>
> it.unimi.di.law.warc.util.InspectableCachedHttpEntity.copyContent(InspectableCachedHttpEntity.java:67)
> copyContent() would love to read(byte[],int,int) in a buffer, but since
> LazyDecompressingInputStream doesn't override it it invokes instead the
> read-byte-by-byte inherited method in InputStream, which in turn now calls
> for each byte the one-byte read() method from LazyDecompressingInputStream,
> which invokes the one-byte read method from InflaterInputStream, which does a
> multi-byte, length-one read from GZIPInputStream, which unleashes a similar
> call on InflaterInputStream, which unfortunately makes a similar read using
> the native inflateBytes() method.
> Thus, for each byte there is a native-method call. The result is a 10-50x
> increase in CPU usage, which turns into a 10x-50x decrease in speed if, as in
> our case, you have 7000 threads downloading in parallel.
> Overriding read(byte[],int,int) in LazyDecompressingInputStream will solve
> the problem:
> @Override
> public int read(byte[] b, int off, int len) throws IOException {
> initWrapper();
> return wrapperStream.read(b, off, len);
> }
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]