Am 2015-05-23 um 22:29 schrieb Oleg Kalnichevski:
On Sat, 2015-05-23 at 22:09 +0200, Michael Osipov wrote:
Hi,
we are experiencing a (slight) performance problem with HttpClient 4.4.1
while downloading big files from a remote server in the corporate intranet.
A simple test client:
HttpClientBuilder builder = HttpClientBuilder.create();
try (CloseableHttpClient client = builder.build()) {
HttpGet get = new HttpGet("...");
long start = System.nanoTime();
HttpResponse response = client.execute(get);
HttpEntity entity = response.getEntity();
File file = File.createTempFile("prefix", null);
OutputStream os = new FileOutputStream(file);
entity.writeTo(os);
long stop = System.nanoTime();
long contentLength = file.length();
long diff = stop - start;
System.out.printf("Duration: %d ms%n",
TimeUnit.NANOSECONDS.toMillis(diff));
System.out.printf("Size: %d%n", contentLength);
float speed = contentLength / (float) diff * (1_000_000_000 / 1_000_000);
System.out.printf("Speed: %.2f MB/s%n", speed);
}
After at least 10 repetions I see that the 182 MB file is download
within 24 000 ms with about 8 MB/s max. I cannot top that.
I have tried this over and over again with curl and see that curl is
able to saturate the entire LAN connection (100 Mbit/s).
My tests are done on Windows 7 64 bit, JDK 7u67 32 bit.
Any idea what the bottleneck might me?
Thanks for the quick response.
(1) Curl should be using zero copy file transfer which Java blocking i/o
does not support. HttpAsyncClient on the other hand supports zero copy
file transfer and generally tends to perform better when writing content
out directly to the disk.
I did try this [1] example and my heap exploaded. After increasing it to
-Xmx1024M, it did saturate the entire connection.
(2) Use larger socket / intermediate buffers. Default buffer size used
by Entity implementations is most likely suboptimal.
That did not make any difference. I have changed:
1. Socket receive size
2. Employed a buffered input stream
3. Manually copied the stream to a file
I have varied the buffer size from 2^14 to 2^20 bytes. No avail.
Regardless of this, your tip with zero copy helped me a lot.
Unfortunately, this is just a little piece in a performance degregation
chain a colleague has figured out. HttpClient acts as an intermediate in
a webapp which receives a request via REST from a client, processes that
and opens up the stream to the huge files from a remote server. Without
caching the files to disk, I am passing the Entity#getContent stream
back to the client. The degreation is about 75 %.
After rethinking your tips, I just checked the servers I am pulling off
data. One is slow the otherone is fast. Transfer speeds with piping the
streams from the fast server remains at 8 MB/s which is what I wanted
after I have identified an issue with my custom HttpResponseInputStream.
I modified my code to use the async client and it seems to pipe with
maximum LAN speed though it looks weird with curl now. Curl blocks for
15 seconds and within a second the entire stream is written down to disk.
But anyway, you helped me a lot to sort out the issue. I will think
about the async stuff. I am not very experienced with that, the
handling/API looks different and it has to fit into the model mentioned
above.
[1] https://hc.apache.org/httpcomponents-asyncclient-4.1.x/quickstart.html
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org