On Jan 6, 2012, at 1:01pm, Oleg Kalnichevski wrote: > On Fri, 2012-01-06 at 11:06 -0500, Dan Checkoway wrote: >> Hello, >> >> I have an app that needs to make concurrent HTTP requests to a web service >> using persistent (keepalive) connections. I'm using >> ThreadSafeClientConnManager. I ran into a performance bottleneck, and I >> believe I've pinpointed the issue... >> >> Affects Version(s): HttpCore 4.1.3, HttpClient 4.1.2 >> >> I construct my connection manager and client like this: >> >> connMgr = new >> ThreadSafeClientConnManager(SchemeRegistryFactory.createDefault(), -1, >> TimeUnit.MILLISECONDS); >> connMgr.setMaxTotal(400); >> connMgr.setDefaultMaxPerRoute(400); >> >> httpClient = new DefaultHttpClient(connMgr); >> >> Note that this app only talks to a single URI on a single server -- thus >> defaultMaxPerRoute == maxTotal, which I think is correct...please let me >> know if that's bad! >> >> Anyway, my app has a pool of 400 threads and generally performs quite >> well. But when all 400 threads need a connection concurrently, performance >> suffers. I've narrowed it down to contention caused by blocking calls in >> the connection manager. For example...a thread dump shows... >> >> About half my threads are "stuck" (well, not stuck, but slow & waiting) >> here: >> >> "catalina-exec-347" daemon prio=10 tid=0x00007f3a54065000 nid=0x6b73 >> waiting on condition [0x00007f3a29b9a000] >> java.lang.Thread.State: WAITING (parking) >> at sun.misc.Unsafe.park(Native Method) >> - parking to wait for <0x00000006147c8318> (a >> java.util.concurrent.locks.ReentrantLock$NonfairSync) >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) >> at >> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) >> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) >> at >> org.apache.http.impl.conn.tsccm.ConnPoolByRoute.freeEntry(ConnPoolByRoute.java:438) >> at >> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.releaseConnection(ThreadSafeClientConnManager.java:276) >> - locked <0x000000062048ebc8> (a >> org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter) >> at >> org.apache.http.impl.conn.AbstractClientConnAdapter.releaseConnection(AbstractClientConnAdapter.java:308) >> - locked <0x000000062048ebc8> (a >> org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter) >> at >> org.apache.http.conn.BasicManagedEntity.releaseManagedConnection(BasicManagedEntity.java:181) >> at >> org.apache.http.conn.BasicManagedEntity.eofDetected(BasicManagedEntity.java:142) >> at >> org.apache.http.conn.EofSensorInputStream.checkEOF(EofSensorInputStream.java:211) >> at >> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:139) >> ... >> >> While the other half are "stuck" here: >> >> "catalina-exec-346" daemon prio=10 tid=0x00007f3a4c05d000 nid=0x6b72 >> waiting on condition [0x00007f3a29c9b000] >> java.lang.Thread.State: WAITING (parking) >> at sun.misc.Unsafe.park(Native Method) >> - parking to wait for <0x00000006147c8318> (a >> java.util.concurrent.locks.ReentrantLock$NonfairSync) >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) >> at >> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) >> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) >> at >> org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:337) >> at >> org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:300) >> at >> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:224) >> at >> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:401) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:941) >> ... >> >> It's not a deadlock per se. It's just a bottleneck, and it is causing very >> high latency in my app. Below a certain threshold, i.e. when not all 400 >> threads need a connection concurrently, things are fine. But when all 400 >> need a connection at once, that's when it gets painful. >> >> I'm wondering if it might be feasible to switch to using non-blocking calls >> for this, i.e. with ConcurrentHashMap and/or ConcurrentLinkedQueue, or >> something of that nature? I haven't dived into the source code yet, so >> don't slap me too hard if that suggestion was way out of line. :-) >> >> Do you have any suggestions, in terms of ways I might be able to work >> around this bottleneck otherwise? >> >> Thanks! >> >> Dan Checkoway > > Hi Dan > > Yes, your observation is correct. The problem is that the connection > pool is guarded by a global lock. Naturally if you have 400 threads > trying to obtain a connection at about the same time all of them end up > contending for one lock. The problem is that I can't think of a > different way to ensure the max limits (per route and total) are > guaranteed not to be exceeded. If anyone can think of a better algorithm > please do let me know. What might be a possibility is creating a more > lenient and less prone to lock contention issues implementation that may > under stress occasionally allocate a few more connections than the max > limits.
I'd also run into a similar situation during web crawling, when I had 300+ threads sharing one connection pool. Would it work to go for finer-grained locking, by using atomic counters to track & enforce limits on per route/total connections? -- Ken -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr