I just committed a first attempt at providing the above, intended as a fix for POOL-407 and a lot of similar issues reported over the years. The scenario in POOL-407 is common when resource providers (like databases) go down:
1. makeObject requests start to fail and threads line up waiting on the deque. 2. The provider comes back up so makes will succeed again, but the clients, the pool and the factory are all ignorant of this fact, so no clients get served. What I just committed puts the resilience responsibility on the factory, having it monitor itself. That responsibility could arguably be put instead on the pool. To use the feature as is, you need to create a ResilientPooledObjectFactory wrapping a PooledObjectFactory, configure it, attach it to its pool and start its monitor. The formerly disabled GOP test, testLivenessOnTransientFactoryFailure, shows how to do it. The setup is a little awkward. I would appreciate feedback on the following options for how to improve it (or any other comments on the code): 0) Roll it back and come up with something better 1) Leave as is 2) add a GOP config that results in its factory being wrapped automatically in a RPOF. 3) move the functionality into the pool The other thing that needs to be designed is how to make the proactive make attempt strategy configurable. It is hard-coded now in the RPOF runChecks and the Adder inner class. The initial implementation is primitive: Monitor the makeObject log. Any failure triggers start of an Adder that tries addObject with configurable delay and (hard-coded) max failures. Once the circular log becomes filled with successes, turn the adder off. Also, RPOF spawns a monitoring thread and, when it detects a transient failure, an adder thread. Careful review - and improvement - of the management of these threads would be appreciated. I tried to make sure, and added tests to confirm, that closing the pool kills these threads. Phil