Kezhu Wang created ZOOKEEPER-4883:
-------------------------------------

             Summary: Rollover leader epoch when counter part of zxid reach 
limit
                 Key: ZOOKEEPER-4883
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4883
             Project: ZooKeeper
          Issue Type: Improvement
          Components: server
            Reporter: Kezhu Wang


Currently, zxid rollover will cause re-election(ZOOKEEPER-1277) which is time 
consuming.

ZOOKEEPER-2789 proposes to use 24 bits for epoch and 40 bits for counter. I do 
think it is promising as [it promotes rollover rate from 49.7 days to 34.9 
years assuming 1k/s 
ops|https://github.com/apache/zookeeper/pull/2164#issuecomment-2368107479].

But I think it is a one-way ticket. And the change of data format may require 
community wide spread to upgrade third party libraries/tools if they are ever 
tied to this. Inside ZooKeeper, `accepetedEpoch` and `currentEpoch` are tied to 
`zxid`. Given a snapshot and a txn log, we need probably deduced those two 
epoch values to join quorum.

So, I presents alternative solution to rollover leader epoch when counter part 
of zxid reach limit.

# Treats last proposal of an epoch as rollover proposal.
# Requests from next epoch are proposed normally.
# Fences next epoch once rollover proposal persisted.
# Proposals from next epoch will not be written to disk before rollover 
committed.
# Leader commits rollover proposal once it get quorum ACKs.
# Blocked new epoch proposals are logged once rollover proposal is committed in 
corresponding nodes.
 
This results in:

# No other lead cloud lead using next epoch number once rollover proposal is 
considered committed.
# No proposals from next epoch will be written to disk before rollover proposal 
is considered committed.

Here is the branch, I will draft a pr later.

https://github.com/kezhuw/zookeeper/tree/zxid-rollover



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to