Caleb Spare created POOL-287:
--------------------------------
Summary: GKOP can lose objects over time due to swallowed NPE in
the evictor
Key: POOL-287
URL: https://issues.apache.org/jira/browse/POOL-287
Project: Commons Pool
Issue Type: Bug
Affects Versions: 2.3, 2.2
Reporter: Caleb Spare
We ran into this bug via a Redis library that uses a GenericKeyedObjectPool for
connection pooling. We found that over the course of several days, the
connections available to our application were dwindling.
Some relevant configuration:
testWhileIdle: false
numTestsPerEvictionRun: -1
minEvictableIdleTimeMillis: 60000
timeBetweenEvictionRunsMillis: 30000
maxTotalPerKey 400
maxIdlePerKey 400
(In a more minimal repro case I developed later, the problem happens much
faster if minEvictableIdleTimeMillis and timeBetweenEvictionRunsMillis are
reduced greatly, even down to 1.)
We discovered that this is what happens (looking at 2.3 code, but the problem
occurred with 2.2 and 2.3):
* In evict(), there is a variable idleObjects which starts as null
* The branch where idleObjects is set is not necessarily taken
* The null idleObjects is passed to underTest.endEvictionTest(idleObjects)
* If the object underTest had previously been borrowed and was thus set to the
EVICTION_RETURN_TO_HEAD state, then endEvictionTest throws a NPE
* By default this is silently swallowed and the evictor continues on
* Now the object that had been underTest is set to state IDLE, but is not in
the pool's idleObjects list, so it is lost
If it would help, I can clean up my repro program and attach that, but it is
written in Clojure, not Java. I can also try to write a Java repro but I might
not have time to do that for a little while.
This bug can be avoided by disabling the evictor. That doesn't help us, though,
because we rely on background eviction to avoid running into server-side
connection timeouts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)