Re: Adding Randomness to LeaderLatch Election

Enrico Olivelli Fri, 12 Nov 2021 00:16:32 -0800

Tim,

+1 from me
feel free to open a JIRA and send a PR with your proposals


Enrico

Il giorno ven 12 nov 2021 alle ore 03:21 Cameron McKenzie <
cammcken...@apache.org> ha scritto:

> I think this sounds like an interesting feature that would be useful to the
> community. Modifying the existing LeaderLatch would seem like the sensible
> option to me.
> cheers
>
> On Fri, Nov 12, 2021 at 11:01 AM Tim Black <blac...@gmail.com> wrote:
>
> > Hello, everyone! I've been using Curator for several years now, and it
> has
> > definitely made my life much easier when it comes to ZooKeeper. The issue
> > I'm currently having involves the way leadership elections are determined
> > by the order of connection, with no randomness injected. I found a
> previous
> > discussion here, Archived Message Thread
> > <
> >
> https://apache.markmail.org/message/4fufdan25fgczjen?q=list:curator+leaderlatch+random#query:list%3Acurator%20leaderlatch%20random+page:1+mid:dujexkniehmk4abt+state:results
> > >,
> > which indicates that this is the intended behavior, but in our
> operational
> > environment, we're experiencing some problems due to this approach.
> >
> > In my scenario, we have multiple software agents servicing N number of
> > external client connections. When we start up a client connection, we
> have
> > the ability to target which agent is running that particular connection
> on
> > startup.(A preferred starting place) After startup, each client
> connection
> > uses a separate LeaderLatch to allow other agents to monitor the status,
> so
> > that if one agent is shut down, the connections it was servicing in
> theory
> > would spread out amongst the remaining agents. However, the behavior we
> > were seeing is that all connections would go to one single agent. During
> > patches/upgrades, we would do a rolling restart, and all connections
> would
> > end up on the first agent restarted. To rebalance, we would have to
> > manually shut down and re-enable individual connections, which is a much
> > slower process than the automatic leadership election/switch. While
> > researching it, I came across the thread mentioned above.
> >
> > To solve the problem for myself, I modified the LeaderLatch code, moving
> > away from the ephemeral sequential node solution utilized. The latch
> nodes
> > are still named using a "latch-[number]" format, but now the number is
> > generated by adding a random amount from 1-50 to the current leader's
> > index, repeating if a node with that number already exists. This
> > effectively randomizes who the next leader will be across all latches,
> > instead of all of them being determined by connection order.
> >
> > I'm still testing this modification locally. It passes all test cases,
> and
> > I'm working on testing it in a test environment. My primary question is
> if
> > the dev community would be interested in this implementation, either as
> an
> > update to the LeaderLatch class or as a separate
> > recipe(RandomLeaderLatch?). Based on the conversation linked above, I
> > didn't want to open an issue without discussing it here first.
> >
> > Sorry if this is a bit long-winded, but I'm trying to cover both what my
> > use-case is for this, as well as at least a general idea of the solution
> > that I've implemented and am proposing for the project.
> >
> > --
> > Tim Black
> > The Law of Software Entomology:
> > There is always one more bug.
> >
>

Re: Adding Randomness to LeaderLatch Election

Reply via email to