[pool] Resilience against factory outages (POOL-407)

Phil Steitz Fri, 31 May 2024 11:20:41 -0700

I just committed  a first attempt at providing the above, intended as a fix
for POOL-407 and a lot of similar issues reported over the years.  The
scenario in POOL-407 is common when resource providers (like databases) go
down:


1. makeObject requests start to fail and threads line up waiting on the
deque.
2. The provider comes back up so makes will succeed again, but the clients,
the pool and the factory are all ignorant of this fact, so no clients get
served.

What I just committed puts the resilience responsibility on the factory,
having it monitor itself.  That responsibility could arguably be put
instead on the pool.

To use the feature as is, you need to create a ResilientPooledObjectFactory
wrapping a PooledObjectFactory, configure it, attach it to its pool and
start its monitor.  The formerly disabled GOP test,
testLivenessOnTransientFactoryFailure, shows how to do it.  The setup is a
little awkward.  I would appreciate feedback on the following options for
how to improve it (or any other comments on the code):

0) Roll it back and come up with something better
1) Leave as is
2) add a GOP config that results in its factory being wrapped automatically
in a RPOF.
3) move the functionality into the pool

The other thing that needs to be designed is how to make the proactive make
attempt strategy configurable.  It is hard-coded now in the RPOF runChecks
and the Adder inner class.  The initial implementation is primitive:
Monitor the makeObject log.  Any failure triggers start of an Adder that
tries addObject with configurable delay and (hard-coded) max failures.
Once the circular log becomes filled with successes, turn the adder off.

Also, RPOF spawns a monitoring thread and, when it detects a transient
failure, an adder thread.  Careful review - and improvement - of the
management of these threads would be appreciated.  I tried to make sure,
and added tests to confirm, that closing the pool kills these threads.

Phil

[pool] Resilience against factory outages (POOL-407)

Reply via email to