This seems interesting and low bar to entry: https://github.com/arosien/failure
On Tue, Nov 15, 2016 at 4:01 PM, Edward Capriolo <[email protected]> wrote: > I was doing some load testing and I found the the current gating factor > for max instances running in the same JVM is limited by the JMX based > notification system the failure detector uses. > > Currently a cluster of N requires N * (N-1) JMX notification threads. I > started attempting to remove this limit without going into building the > accrual failure detector (22) but there were some nuanced bugs and I backed > off because it did not seem worth the change. > > If anyone has an literature to contribute about building a consensus based > failure detector please discuss. Once we cut this release that is likely > were I will spent my attention. > > Thanks, > Edward >
