Hi,

During leader switches, we observe connection imbalance among our
observers, leading to some observers becoming overloaded with a large
number of connections, which disrupts our capacity estimates.

Further investigation revealed that during leader election in Apache
ZooKeeper, two concurrent types of threads operate within the leader node:

   -

   QuorumPeer thread which manages the quorum protocol.
   -

   The LearnerHandler thread which handles synchronization of the leader
   with learners.

Because these threads operate simultaneously, during the leader's bootstrap
process, some observers may achieve synchronization with the leader before
leader transitions to broadcast state. Meanwhile, others synchronize after
the leader’s broadcast state.

For example, let’s say 14, 65, 119, 82 and 110 are different server IDs of
learners.


   1.

   Initially, all 14, 65, 119, 82 and 110 are in sync with the leader (
   peerLastZxid=0x2b300005cf4).
   2.

   Leader switch happens.
   3.

   New leader sends empty DIFF to 14, 65, 119 (since
   lastProcessedZxid==peerLastZxid).
   4.

   14, 65 and 119 start accepting connections.
   5.

   Leader’s epoch increases, transitions to broadcast state and
   maxCommittedLog changes from 0x2b300005cf4 to 0x2b400000032
   6.

   Leader sends commitedLog DIFF to 82 and 110 since their zxid
   (peerLastZxid) is 0x2b300005cf4.
   7.

   82 and 110 transitions to broadcast  and start accepting connections
   much later than 14, 65, 119 leading to connection imbalance.


Logs:

2024-06-25 06:53:17,508 [myid:] - INFO
[LearnerHandler-/10.155.16.87:57098:?@?]
- Synchronizing with Learner sid: 14 maxCommittedLog=0x2b300005cf4
minCommittedLog=0x2b300005b00 lastProcessedZxid=0x2b300005cf4
peerLastZxid=0x2b300005cf4

2024-06-25 06:53:17,508 [myid:] - INFO
[LearnerHandler-/10.155.23.246:44712:?@?]
- Synchronizing with Learner sid: 65 maxCommittedLog=0x2b300005cf4
minCommittedLog=0x2b300005b00
lastProcessedZxid=0x2b300005cf4 peerLastZxid=0x2b300005cf4

2024-06-25 06:53:17,508 [myid:] - INFO
[LearnerHandler-/10.155.178.245:49152:?@?] - Synchronizing with Learner
sid: 119 maxCommittedLog=0x2b300005cf4 minCommittedLog=0x2b300005b00
lastProcessedZxid=0x2b300005cf4 peerLastZxid=0x2b300005cf4

—-----------------------------

2024-06-25 06:53:18,467 [myid:] - INFO
[QuorumPeer[myid=4](plain=[0:0:0:0:0:0:0:0]:12913)(secure=[0:0:0:0:0:0:0:0]:12912):?@?]
- Peer state changed: leading - broadcast

—-----------------------------

2024-06-25 06:53:23,071 [myid:] - INFO
[LearnerHandler-/10.199.145.252:59972:?@?] - On disk txn sync enabled with
snapshotSizeFactor 0.33

2024-06-25 06:53:23,071 [myid:] - INFO
[LearnerHandler-/10.199.145.252:59972:?@?] - Synchronizing with Learner
sid: 110 maxCommittedLog=0x2b400000032 minCommittedLog=0x2b300005b32
lastProcessedZxid=0x2b400000032 peerLastZxid=0x2b300005cf4

2024-06-25 06:53:23,071 [myid:] - INFO
[LearnerHandler-/10.199.145.252:59972:?@?] - Using committedLog for peer
sid: 110

2024-06-25 06:53:23,072 [myid:] - INFO
[LearnerHandler-/10.155.180.220:46648:?@?] - On disk txn sync enabled with
snapshotSizeFactor 0.33

2024-06-25 06:53:23,072 [myid:] - INFO
[LearnerHandler-/10.155.180.220:46648:?@?] - Synchronizing with Learner
sid: 82 maxCommittedLog=0x2b400000032 minCommittedLog=0x2b300005b32
lastProcessedZxid=0x2b400000032 peerLastZxid=0x2b300005cf4

2024-06-25 06:53:23,072 [myid:] - INFO
[LearnerHandler-/10.155.180.220:46648:?@?] - Using committedLog for peer
sid: 82


Questions:

   1.

   Why does new leader start synchronization (via empty DIFF) with some
   observers (14, 65, 119) before others (82, 110)?
   2.

   Can all synchronization start before or after the leader's epoch is
   incremented and changes to broadcast state? Why is the current behavior not
   this way?


Regards,
Abhilash

Reply via email to