Hi guys, I'm a bit stuck with an issue that Lucas has already faced (and raised a JIRA for a few months ago), issue I thought we have fixed. Let me explain :
Shawn informed me that he has some failure when using the LDAP API for a project he is working on. He is conducting some perfomance tests with Fortress, using the LDAP API and another Ldap server. Fortress and the LDAP server are not at stake here, I was able to build a scenario that reproduces the problem without Fortress and the external LDAP server, the test has been pushed this morning ( LightweightLdapConnectionPoolTest.testManyConnections()). This test is spawning T threads, each one of them do a lookup of the RootDSE N times. When this test is ran with T=100 and N with various values, I get some errors at the end of the test. For instance, when N = 2000, I get errors like : Failure to get a connection on iteration 1982 : Cannot connect on the server: Connection refused Thread Thread[Thread-320,5,main] failed after 1982 iterations in 7393ms Failure to get a connection on iteration 1974 : Cannot connect on the server: Connection refused ... For N = 4000, I get such errors : Failure to get a connection on iteration 3923 : Cannot connect on the server: Connection refused Failure to get a connection on iteration 3969 : Cannot connect on the server: Connection refused ... For N = 10 000 : Failure to get a connection on iteration 9925 : Cannot connect on the server: Connection refused Failure to get a connection on iteration 9763 : Cannot connect on the server: Connection refused ... For N = 20 000 Failure to get a connection on iteration 19890 : Cannot connect on the server: Connection refused Failure to get a connection on iteration 19956 : Cannot connect on the server: Connection refused etc. I even tested with N = 1 000 000, and I had no failures. What I deduce from those tests, and the errors I get, is that the test is failing at the very end of each loop (ie, we successfully get connected and get back result until the very end, but a few loops). based on such informations, I went a bit further, and noticed that we have an idle parameter that says that when we release a connection, it is moved back to the pool, but if the number of idle connection is above a value (default to 8), then the connection get closed. My understanding is that when we reach the end of each loops (the N value), the threads get complited one after the other, the connections get released, and when we reach the number of idle connections limit, then we start deleting the connections. So far, so good, but at the same time, some other threads are continuing to pound the server, and still reclaim some connections from the pool. What I see happening is that the connections which are returned to one of the active thread has just been closed because the number of idle connections have reach the limit, and then the connection is in a bad state, which leads to an error. Changing the parameter that regulate the pool's maxIdle value have a huge impact on the test : if I set it tp -1 (ie, we don't care about idle connections), then the test passes 100%. If I set the parameter to a value > Nb Threads, then it passes too. OTOH, if I set the value to 0 (ie, each connection that is returned to the pool is immediately destroyed), the test fails. So bottom line, there is a workaround : set maxIdle to -1, but the drawback is that we may have a growing pool size if the connections are never released and put back to the pool. The other workaround is to set this parameter to a value abve the number of thread (not easy to determinate). At this point, I'm stuck : I have no idea why a connection that get released can't be realocated. It's clearly deep into the stack, either a pb in commons-pool or in MINA. I'd liek to rule out commons-pool and test commons-pool2, but it seems like it does not please the osgi test, so here, my question is : - how do we chnage the osgi test to accpet commons.pool2 as dependency ? Any input woud be very welcome !
