Re: Possibility of using non-blocking calls for connection pools?

sebb Fri, 06 Jan 2012 14:20:46 -0800

On 6 January 2012 22:07, Ken Krugler <kkrugler_li...@transpac.com> wrote:
>
> On Jan 6, 2012, at 1:01pm, Oleg Kalnichevski wrote:
>
>> On Fri, 2012-01-06 at 11:06 -0500, Dan Checkoway wrote:
>>> Hello,
>>>
>>> I have an app that needs to make concurrent HTTP requests to a web service
>>> using persistent (keepalive) connections.  I'm using
>>> ThreadSafeClientConnManager.  I ran into a performance bottleneck, and I
>>> believe I've pinpointed the issue...
>>>
>>> Affects Version(s): HttpCore 4.1.3, HttpClient 4.1.2
>>>
>>> I construct my connection manager and client like this:
>>>
>>>        connMgr = new
>>> ThreadSafeClientConnManager(SchemeRegistryFactory.createDefault(), -1,
>>> TimeUnit.MILLISECONDS);
>>>        connMgr.setMaxTotal(400);
>>>        connMgr.setDefaultMaxPerRoute(400);
>>>
>>>        httpClient = new DefaultHttpClient(connMgr);
>>>
>>> Note that this app only talks to a single URI on a single server -- thus
>>> defaultMaxPerRoute == maxTotal, which I think is correct...please let me
>>> know if that's bad!
>>>
>>> Anyway, my app has a pool of 400 threads and generally performs quite
>>> well.  But when all 400 threads need a connection concurrently, performance
>>> suffers.  I've narrowed it down to contention caused by blocking calls in
>>> the connection manager.  For example...a thread dump shows...
>>>
>>> About half my threads are "stuck" (well, not stuck, but slow & waiting)
>>> here:
>>>
>>> "catalina-exec-347" daemon prio=10 tid=0x00007f3a54065000 nid=0x6b73
>>> waiting on condition [0x00007f3a29b9a000]
>>>   java.lang.Thread.State: WAITING (parking)
>>>    at sun.misc.Unsafe.park(Native Method)
>>>    - parking to wait for  <0x00000006147c8318> (a
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>>>    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>    at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>>>    at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>>>    at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>>>    at
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>>>    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>>>    at
>>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute.freeEntry(ConnPoolByRoute.java:438)
>>>    at
>>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager.releaseConnection(ThreadSafeClientConnManager.java:276)
>>>    - locked <0x000000062048ebc8> (a
>>> org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter)
>>>    at
>>> org.apache.http.impl.conn.AbstractClientConnAdapter.releaseConnection(AbstractClientConnAdapter.java:308)
>>>    - locked <0x000000062048ebc8> (a
>>> org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter)
>>>    at
>>> org.apache.http.conn.BasicManagedEntity.releaseManagedConnection(BasicManagedEntity.java:181)
>>>    at
>>> org.apache.http.conn.BasicManagedEntity.eofDetected(BasicManagedEntity.java:142)
>>>    at
>>> org.apache.http.conn.EofSensorInputStream.checkEOF(EofSensorInputStream.java:211)
>>>    at
>>> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:139)
>>>    ...
>>>
>>> While the other half are "stuck" here:
>>>
>>> "catalina-exec-346" daemon prio=10 tid=0x00007f3a4c05d000 nid=0x6b72
>>> waiting on condition [0x00007f3a29c9b000]
>>>   java.lang.Thread.State: WAITING (parking)
>>>    at sun.misc.Unsafe.park(Native Method)
>>>    - parking to wait for  <0x00000006147c8318> (a
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>>>    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>    at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>>>    at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>>>    at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>>>    at
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>>>    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>>>    at
>>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:337)
>>>    at
>>> org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:300)
>>>    at
>>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:224)
>>>    at
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:401)
>>>    at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>>>    at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:941)
>>>    ...
>>>
>>> It's not a deadlock per se.  It's just a bottleneck, and it is causing very
>>> high latency in my app.  Below a certain threshold, i.e. when not all 400
>>> threads need a connection concurrently, things are fine.  But when all 400
>>> need a connection at once, that's when it gets painful.
>>>
>>> I'm wondering if it might be feasible to switch to using non-blocking calls
>>> for this, i.e. with ConcurrentHashMap and/or ConcurrentLinkedQueue, or
>>> something of that nature?  I haven't dived into the source code yet, so
>>> don't slap me too hard if that suggestion was way out of line.  :-)
>>>
>>> Do you have any suggestions, in terms of ways I might be able to work
>>> around this bottleneck otherwise?
>>>
>>> Thanks!
>>>
>>> Dan Checkoway
>>
>> Hi Dan
>>
>> Yes, your observation is correct. The problem is that the connection
>> pool is guarded by a global lock. Naturally if you have 400 threads
>> trying to obtain a connection at about the same time all of them end up
>> contending for one lock. The problem is that I can't think of a
>> different way to ensure the max limits (per route and total) are
>> guaranteed not to be exceeded. If anyone can think of a better algorithm
>> please do let me know. What might be a possibility is creating a more
>> lenient and less prone to lock contention issues implementation that may
>> under stress occasionally allocate a few more connections than the max
>> limits.
>
> I'd also run into a similar situation during web crawling, when I had 300+ 
> threads sharing one connection pool.
>
> Would it work to go for finer-grained locking, by using atomic counters to 
> track & enforce limits on per route/total connections?


If the per-route limit is likely to be reached, it might help to have
a lock per route.
If the route limit has not been reached, only then grab the global lock.

However this won't help unless the per-route limits are reached
sufficiently often.

> -- Ken
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org

Re: Possibility of using non-blocking calls for connection pools?

Reply via email to