Tim, +1 from me feel free to open a JIRA and send a PR with your proposals
Enrico Il giorno ven 12 nov 2021 alle ore 03:21 Cameron McKenzie < cammcken...@apache.org> ha scritto: > I think this sounds like an interesting feature that would be useful to the > community. Modifying the existing LeaderLatch would seem like the sensible > option to me. > cheers > > On Fri, Nov 12, 2021 at 11:01 AM Tim Black <blac...@gmail.com> wrote: > > > Hello, everyone! I've been using Curator for several years now, and it > has > > definitely made my life much easier when it comes to ZooKeeper. The issue > > I'm currently having involves the way leadership elections are determined > > by the order of connection, with no randomness injected. I found a > previous > > discussion here, Archived Message Thread > > < > > > https://apache.markmail.org/message/4fufdan25fgczjen?q=list:curator+leaderlatch+random#query:list%3Acurator%20leaderlatch%20random+page:1+mid:dujexkniehmk4abt+state:results > > >, > > which indicates that this is the intended behavior, but in our > operational > > environment, we're experiencing some problems due to this approach. > > > > In my scenario, we have multiple software agents servicing N number of > > external client connections. When we start up a client connection, we > have > > the ability to target which agent is running that particular connection > on > > startup.(A preferred starting place) After startup, each client > connection > > uses a separate LeaderLatch to allow other agents to monitor the status, > so > > that if one agent is shut down, the connections it was servicing in > theory > > would spread out amongst the remaining agents. However, the behavior we > > were seeing is that all connections would go to one single agent. During > > patches/upgrades, we would do a rolling restart, and all connections > would > > end up on the first agent restarted. To rebalance, we would have to > > manually shut down and re-enable individual connections, which is a much > > slower process than the automatic leadership election/switch. While > > researching it, I came across the thread mentioned above. > > > > To solve the problem for myself, I modified the LeaderLatch code, moving > > away from the ephemeral sequential node solution utilized. The latch > nodes > > are still named using a "latch-[number]" format, but now the number is > > generated by adding a random amount from 1-50 to the current leader's > > index, repeating if a node with that number already exists. This > > effectively randomizes who the next leader will be across all latches, > > instead of all of them being determined by connection order. > > > > I'm still testing this modification locally. It passes all test cases, > and > > I'm working on testing it in a test environment. My primary question is > if > > the dev community would be interested in this implementation, either as > an > > update to the LeaderLatch class or as a separate > > recipe(RandomLeaderLatch?). Based on the conversation linked above, I > > didn't want to open an issue without discussing it here first. > > > > Sorry if this is a bit long-winded, but I'm trying to cover both what my > > use-case is for this, as well as at least a general idea of the solution > > that I've implemented and am proposing for the project. > > > > -- > > Tim Black > > The Law of Software Entomology: > > There is always one more bug. > > >