Relatively new to the ZK code base, please be gentle. :) This is bordering
on a question for users@, but I'm asking here because I'm more than happy
to try and dig into the code if it's not too far beyond my reach -- hope
I'm trying to dig into / work around ZOOKEEPER-2938:
Unfortunately, the proposed work-around (simply restarting the leader)
isn't particularly great for us because of some limitations in our
automation -- so I'm trying to see if we can find some alternatives and/or
fix the issue properly.
-- afaict what's happening is the "unhappy"/prospective member of the
quorum is attempting to connect to other, established members, sends a
challenge request (which seems to just be a simple payload consisting of
its ID and the local election host + port), then promptly closes the
connection because its own ID is less than that of the recipient(s) --
seemingly without waiting for a response.
The mechanics are all easy enough to understand, but I feel like I'm
lacking some context RE: what's *supposed *to happen here. When this code
is all working as expected, what *should *happen with respect to these
challenges? What is this code trying to achieve by forcefully disconnecting
from peers with an ID greater than the local peer?
I also don't fully understand why restarting the leader would fix things,
but that's probably just something I need to dive into to get to the bottom
Appreciate any guidance.