On Sun, 2015-05-24 at 13:02 +0200, Michael Osipov wrote: > Am 2015-05-24 um 12:17 schrieb Oleg Kalnichevski: > > On Sun, 2015-05-24 at 00:29 +0200, Michael Osipov wrote: > >> Am 2015-05-23 um 22:29 schrieb Oleg Kalnichevski: > >>> On Sat, 2015-05-23 at 22:09 +0200, Michael Osipov wrote: > >>>> Hi, > >>>> > >>>> we are experiencing a (slight) performance problem with HttpClient 4.4.1 > >>>> while downloading big files from a remote server in the corporate > >>>> intranet. > >>>> > >>>> A simple test client: > >>>> HttpClientBuilder builder = HttpClientBuilder.create(); > >>>> try (CloseableHttpClient client = builder.build()) { > >>>> HttpGet get = new HttpGet("..."); > >>>> long start = System.nanoTime(); > >>>> HttpResponse response = client.execute(get); > >>>> HttpEntity entity = response.getEntity(); > >>>> > >>>> File file = File.createTempFile("prefix", null); > >>>> OutputStream os = new FileOutputStream(file); > >>>> entity.writeTo(os); > >>>> long stop = System.nanoTime(); > >>>> long contentLength = file.length(); > >>>> > >>>> long diff = stop - start; > >>>> System.out.printf("Duration: %d ms%n", > >>>> TimeUnit.NANOSECONDS.toMillis(diff)); > >>>> System.out.printf("Size: %d%n", contentLength); > >>>> > >>>> float speed = contentLength / (float) diff * (1_000_000_000 / > >>>> 1_000_000); > >>>> > >>>> System.out.printf("Speed: %.2f MB/s%n", speed); > >>>> } > >>>> > >>>> After at least 10 repetions I see that the 182 MB file is download > >>>> within 24 000 ms with about 8 MB/s max. I cannot top that. > >>>> > >>>> I have tried this over and over again with curl and see that curl is > >>>> able to saturate the entire LAN connection (100 Mbit/s). > >>>> > >>>> My tests are done on Windows 7 64 bit, JDK 7u67 32 bit. > >>>> > >>>> Any idea what the bottleneck might me? > >> > >> Thanks for the quick response. > >> > >>> (1) Curl should be using zero copy file transfer which Java blocking i/o > >>> does not support. HttpAsyncClient on the other hand supports zero copy > >>> file transfer and generally tends to perform better when writing content > >>> out directly to the disk. > >> > >> I did try this [1] example and my heap exploaded. After increasing it to > >> -Xmx1024M, it did saturate the entire connection. > >> > > > > This sounds wrong. The example below does not use zero copy (with zero > > copy there should be no heap memory allocation at all). > > > > This example demonstrates how to use zero copy file transfer > > > > http://hc.apache.org/httpcomponents-asyncclient-4.1.x/httpasyncclient/examples/org/apache/http/examples/nio/client/ZeroCopyHttpExchange.java > > I have seen this example but there is no ZeroCopyGet. I haven't found > any example which explicitly says use zero-copy for GETs. The example > from [1] did work but with the explosion. What did I wrong here. >
Zero copy can be employed only if a message encloses an entity in it. Therefore there is no such thing as ZeroCopyGet in HC. One can execute a normal GET request and use a ZeroCopyConsumer to stream content out directly to a file without any intermediate buffering in memory. > >>> (2) Use larger socket / intermediate buffers. Default buffer size used > >>> by Entity implementations is most likely suboptimal. > >> > >> That did not make any difference. I have changed: > >> > >> 1. Socket receive size > >> 2. Employed a buffered input stream > >> 3. Manually copied the stream to a file > >> > >> I have varied the buffer size from 2^14 to 2^20 bytes. No avail. > >> Regardless of this, your tip with zero copy helped me a lot. > >> > >> Unfortunately, this is just a little piece in a performance degregation > >> chain a colleague has figured out. HttpClient acts as an intermediate in > >> a webapp which receives a request via REST from a client, processes that > >> and opens up the stream to the huge files from a remote server. Without > >> caching the files to disk, I am passing the Entity#getContent stream > >> back to the client. The degreation is about 75 %. > >> > >> After rethinking your tips, I just checked the servers I am pulling off > >> data. One is slow the otherone is fast. Transfer speeds with piping the > >> streams from the fast server remains at 8 MB/s which is what I wanted > >> after I have identified an issue with my custom HttpResponseInputStream. > >> > >> I modified my code to use the async client and it seems to pipe with > >> maximum LAN speed though it looks weird with curl now. Curl blocks for > >> 15 seconds and within a second the entire stream is written down to disk. > >> > > > > It all sounds very bizarre. I see no reason why HttpAsyncClient without > > zero copy transfer should do any better than HttpClient in this > > scenario. > > So you are saying something is probably wrong with my client setup? > I think it is not unlikely. Oleg --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org For additional commands, e-mail: httpclient-users-h...@hc.apache.org