Ldap pool issue...

Emmanuel Lécharny Fri, 26 Dec 2014 17:03:18 -0800

Hi guys,

I'm a bit stuck with an issue that Lucas has already faced (and raised a
JIRA for a few months ago), issue I thought we have fixed. Let me explain :


Shawn informed me that he has some failure when using the LDAP API for a
project he is working on. He is conducting some perfomance tests with
Fortress, using the LDAP API and another Ldap server. Fortress and the
LDAP server are not at stake here, I was able to build a scenario that
reproduces the problem without Fortress and the external LDAP server,
the test has been pushed this morning (
LightweightLdapConnectionPoolTest.testManyConnections()). This test is
spawning T threads, each one of them do a lookup of the RootDSE N times.

When this test is ran with T=100 and N with various values, I get some
errors at the end of the test. For instance, when N = 2000, I get errors
like :



Failure to get a connection on iteration 1982 : Cannot connect on the
server: Connection refused
Thread Thread[Thread-320,5,main] failed after 1982 iterations  in 7393ms
Failure to get a connection on iteration 1974 : Cannot connect on the
server: Connection refused
...


For N = 4000, I get such errors :


Failure to get a connection on iteration 3923 : Cannot connect on the
server: Connection refused
Failure to get a connection on iteration 3969 : Cannot connect on the
server: Connection refused
...

 
For N = 10 000  :

Failure to get a connection on iteration 9925 : Cannot connect on the
server: Connection refused
Failure to get a connection on iteration 9763 : Cannot connect on the
server: Connection refused
...


For N = 20 000

Failure to get a connection on iteration 19890 : Cannot connect on the
server: Connection refused
Failure to get a connection on iteration 19956 : Cannot connect on the
server: Connection refused


etc. I even tested with N = 1 000 000, and I had no failures.

What I deduce from those tests, and the errors I get, is that the test is 
failing at the very end of each loop (ie, we successfully get connected and get 
back result until the very end, but a few loops). 

based on such informations, I went a bit further, and noticed that we have an 
idle parameter that says that when we release a connection, it is moved back to 
the pool, but if the number of idle connection is above a value (default to 8), 
then the connection get closed. My understanding is that when we reach the end 
of each loops (the N value), the threads get complited one after the other, the 
connections get released, and when we reach the number of idle connections 
limit, then we start deleting the connections. So far, so good, but at the same 
time, some other threads are continuing to pound the server, and still reclaim 
some connections from the pool. 

What I see happening is that the connections which are returned to one of the 
active thread has just been closed because the number of idle connections have 
reach the limit, and then the connection is in a bad state, which leads to an 
error. Changing the parameter that regulate the pool's maxIdle value have a 
huge impact on the test : if I set it tp -1 (ie, we don't care about idle 
connections), then the test passes 100%. If I set the parameter to a value > Nb 
Threads, then it passes too. OTOH, if I set the value to 0 (ie, each connection 
that is returned to the pool is immediately destroyed), the test fails.

So bottom line, there is a workaround : set maxIdle to -1, but the drawback is 
that we may have a growing pool size if the connections are never released and 
put back to the pool. The other workaround is to set this parameter to a value 
abve the number of thread (not easy to determinate).

At this point, I'm stuck : I have no idea why a connection that get released 
can't be realocated. It's clearly deep into the stack, either a pb in 
commons-pool or in MINA.

I'd liek to rule out commons-pool and test commons-pool2, but it seems like it 
does not please the osgi test, so here, my question is : 
- how do we chnage the osgi test to accpet commons.pool2 as dependency ?

Any input woud be very welcome !

Ldap pool issue...

Reply via email to