On Jan 6, 2012, at 1:01pm, Oleg Kalnichevski wrote:

> On Fri, 2012-01-06 at 11:06 -0500, Dan Checkoway wrote:
>> Hello,
>> 
>> I have an app that needs to make concurrent HTTP requests to a web service
>> using persistent (keepalive) connections.  I'm using
>> ThreadSafeClientConnManager.  I ran into a performance bottleneck, and I
>> believe I've pinpointed the issue...
>> 
>> Affects Version(s): HttpCore 4.1.3, HttpClient 4.1.2
>> 
>> I construct my connection manager and client like this:
>> 
>>        connMgr = new
>> ThreadSafeClientConnManager(SchemeRegistryFactory.createDefault(), -1,
>> TimeUnit.MILLISECONDS);
>>        connMgr.setMaxTotal(400);
>>        connMgr.setDefaultMaxPerRoute(400);
>> 
>>        httpClient = new DefaultHttpClient(connMgr);
>> 
>> Note that this app only talks to a single URI on a single server -- thus
>> defaultMaxPerRoute == maxTotal, which I think is correct...please let me
>> know if that's bad!
>> 
>> Anyway, my app has a pool of 400 threads and generally performs quite
>> well.  But when all 400 threads need a connection concurrently, performance
>> suffers.  I've narrowed it down to contention caused by blocking calls in
>> the connection manager.  For example...a thread dump shows...
>> 
>> About half my threads are "stuck" (well, not stuck, but slow & waiting)
>> here:
>> 
>> "catalina-exec-347" daemon prio=10 tid=0x00007f3a54065000 nid=0x6b73
>> waiting on condition [0x00007f3a29b9a000]
>>   java.lang.Thread.State: WAITING (parking)
>>    at sun.misc.Unsafe.park(Native Method)
>>    - parking to wait for  <0x00000006147c8318> (a
>> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>>    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>    at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>>    at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>>    at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>>    at
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>>    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>>    at
>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute.freeEntry(ConnPoolByRoute.java:438)
>>    at
>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.releaseConnection(ThreadSafeClientConnManager.java:276)
>>    - locked <0x000000062048ebc8> (a
>> org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter)
>>    at
>> org.apache.http.impl.conn.AbstractClientConnAdapter.releaseConnection(AbstractClientConnAdapter.java:308)
>>    - locked <0x000000062048ebc8> (a
>> org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter)
>>    at
>> org.apache.http.conn.BasicManagedEntity.releaseManagedConnection(BasicManagedEntity.java:181)
>>    at
>> org.apache.http.conn.BasicManagedEntity.eofDetected(BasicManagedEntity.java:142)
>>    at
>> org.apache.http.conn.EofSensorInputStream.checkEOF(EofSensorInputStream.java:211)
>>    at
>> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:139)
>>    ...
>> 
>> While the other half are "stuck" here:
>> 
>> "catalina-exec-346" daemon prio=10 tid=0x00007f3a4c05d000 nid=0x6b72
>> waiting on condition [0x00007f3a29c9b000]
>>   java.lang.Thread.State: WAITING (parking)
>>    at sun.misc.Unsafe.park(Native Method)
>>    - parking to wait for  <0x00000006147c8318> (a
>> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>>    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>    at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>>    at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>>    at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>>    at
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>>    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>>    at
>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:337)
>>    at
>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:300)
>>    at
>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:224)
>>    at
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:401)
>>    at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>>    at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:941)
>>    ...
>> 
>> It's not a deadlock per se.  It's just a bottleneck, and it is causing very
>> high latency in my app.  Below a certain threshold, i.e. when not all 400
>> threads need a connection concurrently, things are fine.  But when all 400
>> need a connection at once, that's when it gets painful.
>> 
>> I'm wondering if it might be feasible to switch to using non-blocking calls
>> for this, i.e. with ConcurrentHashMap and/or ConcurrentLinkedQueue, or
>> something of that nature?  I haven't dived into the source code yet, so
>> don't slap me too hard if that suggestion was way out of line.  :-)
>> 
>> Do you have any suggestions, in terms of ways I might be able to work
>> around this bottleneck otherwise?
>> 
>> Thanks!
>> 
>> Dan Checkoway
> 
> Hi Dan
> 
> Yes, your observation is correct. The problem is that the connection
> pool is guarded by a global lock. Naturally if you have 400 threads
> trying to obtain a connection at about the same time all of them end up
> contending for one lock. The problem is that I can't think of a
> different way to ensure the max limits (per route and total) are
> guaranteed not to be exceeded. If anyone can think of a better algorithm
> please do let me know. What might be a possibility is creating a more
> lenient and less prone to lock contention issues implementation that may
> under stress occasionally allocate a few more connections than the max
> limits.

I'd also run into a similar situation during web crawling, when I had 300+ 
threads sharing one connection pool.

Would it work to go for finer-grained locking, by using atomic counters to 
track & enforce limits on per route/total connections?

-- Ken

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr




Reply via email to