On 6 January 2012 22:07, Ken Krugler <kkrugler_li...@transpac.com> wrote: > > On Jan 6, 2012, at 1:01pm, Oleg Kalnichevski wrote: > >> On Fri, 2012-01-06 at 11:06 -0500, Dan Checkoway wrote: >>> Hello, >>> >>> I have an app that needs to make concurrent HTTP requests to a web service >>> using persistent (keepalive) connections. I'm using >>> ThreadSafeClientConnManager. I ran into a performance bottleneck, and I >>> believe I've pinpointed the issue... >>> >>> Affects Version(s): HttpCore 4.1.3, HttpClient 4.1.2 >>> >>> I construct my connection manager and client like this: >>> >>> connMgr = new >>> ThreadSafeClientConnManager(SchemeRegistryFactory.createDefault(), -1, >>> TimeUnit.MILLISECONDS); >>> connMgr.setMaxTotal(400); >>> connMgr.setDefaultMaxPerRoute(400); >>> >>> httpClient = new DefaultHttpClient(connMgr); >>> >>> Note that this app only talks to a single URI on a single server -- thus >>> defaultMaxPerRoute == maxTotal, which I think is correct...please let me >>> know if that's bad! >>> >>> Anyway, my app has a pool of 400 threads and generally performs quite >>> well. But when all 400 threads need a connection concurrently, performance >>> suffers. I've narrowed it down to contention caused by blocking calls in >>> the connection manager. For example...a thread dump shows... >>> >>> About half my threads are "stuck" (well, not stuck, but slow & waiting) >>> here: >>> >>> "catalina-exec-347" daemon prio=10 tid=0x00007f3a54065000 nid=0x6b73 >>> waiting on condition [0x00007f3a29b9a000] >>> java.lang.Thread.State: WAITING (parking) >>> at sun.misc.Unsafe.park(Native Method) >>> - parking to wait for <0x00000006147c8318> (a >>> java.util.concurrent.locks.ReentrantLock$NonfairSync) >>> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>> at >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) >>> at >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) >>> at >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) >>> at >>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) >>> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) >>> at >>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute.freeEntry(ConnPoolByRoute.java:438) >>> at >>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.releaseConnection(ThreadSafeClientConnManager.java:276) >>> - locked <0x000000062048ebc8> (a >>> org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter) >>> at >>> org.apache.http.impl.conn.AbstractClientConnAdapter.releaseConnection(AbstractClientConnAdapter.java:308) >>> - locked <0x000000062048ebc8> (a >>> org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter) >>> at >>> org.apache.http.conn.BasicManagedEntity.releaseManagedConnection(BasicManagedEntity.java:181) >>> at >>> org.apache.http.conn.BasicManagedEntity.eofDetected(BasicManagedEntity.java:142) >>> at >>> org.apache.http.conn.EofSensorInputStream.checkEOF(EofSensorInputStream.java:211) >>> at >>> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:139) >>> ... >>> >>> While the other half are "stuck" here: >>> >>> "catalina-exec-346" daemon prio=10 tid=0x00007f3a4c05d000 nid=0x6b72 >>> waiting on condition [0x00007f3a29c9b000] >>> java.lang.Thread.State: WAITING (parking) >>> at sun.misc.Unsafe.park(Native Method) >>> - parking to wait for <0x00000006147c8318> (a >>> java.util.concurrent.locks.ReentrantLock$NonfairSync) >>> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>> at >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) >>> at >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) >>> at >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) >>> at >>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) >>> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) >>> at >>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:337) >>> at >>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:300) >>> at >>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:224) >>> at >>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:401) >>> at >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) >>> at >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:941) >>> ... >>> >>> It's not a deadlock per se. It's just a bottleneck, and it is causing very >>> high latency in my app. Below a certain threshold, i.e. when not all 400 >>> threads need a connection concurrently, things are fine. But when all 400 >>> need a connection at once, that's when it gets painful. >>> >>> I'm wondering if it might be feasible to switch to using non-blocking calls >>> for this, i.e. with ConcurrentHashMap and/or ConcurrentLinkedQueue, or >>> something of that nature? I haven't dived into the source code yet, so >>> don't slap me too hard if that suggestion was way out of line. :-) >>> >>> Do you have any suggestions, in terms of ways I might be able to work >>> around this bottleneck otherwise? >>> >>> Thanks! >>> >>> Dan Checkoway >> >> Hi Dan >> >> Yes, your observation is correct. The problem is that the connection >> pool is guarded by a global lock. Naturally if you have 400 threads >> trying to obtain a connection at about the same time all of them end up >> contending for one lock. The problem is that I can't think of a >> different way to ensure the max limits (per route and total) are >> guaranteed not to be exceeded. If anyone can think of a better algorithm >> please do let me know. What might be a possibility is creating a more >> lenient and less prone to lock contention issues implementation that may >> under stress occasionally allocate a few more connections than the max >> limits. > > I'd also run into a similar situation during web crawling, when I had 300+ > threads sharing one connection pool. > > Would it work to go for finer-grained locking, by using atomic counters to > track & enforce limits on per route/total connections?
If the per-route limit is likely to be reached, it might help to have a lock per route. If the route limit has not been reached, only then grab the global lock. However this won't help unless the per-route limits are reached sufficiently often. > -- Ken > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org For additional commands, e-mail: httpclient-users-h...@hc.apache.org