[In the interest of not hijacking Tony's discussion thread, I'm putting this into a new email.]

Tony Poppleton wrote:
Hi,
Further to the previous mail, I have already implemented my own AbstractHttpEntity to eliminate a byte[] copy. And I have seen the NIO implementations of HttpEntities, however they don't seem to copy using NIO methods so they won't be any faster than the standard IO implementations. Anyway, it seems I have to go a level deeper than this class to be able to do the NIO copy. Is this the right direction to be digging in?
Thanks,
Tony

Tony

Contrary to a common misconception, NIO is significantly slower than the classic blocking I/O in terms of raw data throughput. Modern operating systems and JVMs have become pretty efficient at switching thread contexts. Connection multiplexing starts paying off only when the number of concurrent connections exceeds 2000 or direct data streaming from or to a file is used.

I agree that NIO is often incorrectly viewed as a panacea for all network performance issues.

I did want to mention that there are some multi-threading performance issues which potentially NIO would avoid, for those who are using HttpClient with 100s of threads.

For example, during a Bixo crawl with 300 threads, I was doing regular thread dumps and inspecting the results. A very high percentage (typically > 1/3) were blocked while waiting to get access to the cookie store. By default there's only one of these per HttpClient.

This one was fairly easy to work around, by creating a cookie store in the local context for each request:

            CookieStore cookieStore = new BasicCookieStore();
localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);

But I've run into a few other synchronized method/data bottlenecks, which I'm still working through. For example, at irregular intervals the bulk of my fetcher threads are blocked on getting the scheme registry, either:

"pool-1-thread-9478" prio=10 tid=0x8e9ec400 nid=0x1fb waiting for monitor entry [0x8ee2e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.http.conn.scheme.SchemeRegistry.get(SchemeRegistry.java: 106) - waiting to lock <0x93f2c0c8> (a org.apache.http.conn.scheme.SchemeRegistry) at org .apache .http.client.protocol.RequestAddCookies.process(RequestAddCookies.java: 154) at org .apache .http.protocol.BasicHttpProcessor.process(BasicHttpProcessor.java:251) at org .apache .http.protocol.HttpRequestExecutor.preProcess(HttpRequestExecutor.java: 168) at org .apache .http .impl .client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422)

or

"pool-1-thread-9470" prio=10 tid=0x8e9e7c00 nid=0x1f1 waiting for monitor entry [0x8d986000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org .apache.http.conn.scheme.SchemeRegistry.getScheme(SchemeRegistry.java: 71) - waiting to lock <0x93f2c0c8> (a org.apache.http.conn.scheme.SchemeRegistry) at org .apache .http .impl .conn .DefaultHttpRoutePlanner.determineRoute(DefaultHttpRoutePlanner.java: 111) at org .apache .http .impl .client .DefaultRequestDirector.determineRoute(DefaultRequestDirector.java:619) at org .apache .http .impl .client.DefaultRequestDirector.execute(DefaultRequestDirector.java:319)

If anybody (well, OK, Oleg) has input on things I could be doing wrong to trigger this type of behavior, and/or ways to avoid it, I'm all ears.

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to