Kezhu Wang created ZOOKEEPER-4883: ------------------------------------- Summary: Rollover leader epoch when counter part of zxid reach limit Key: ZOOKEEPER-4883 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4883 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Kezhu Wang
Currently, zxid rollover will cause re-election(ZOOKEEPER-1277) which is time consuming. ZOOKEEPER-2789 proposes to use 24 bits for epoch and 40 bits for counter. I do think it is promising as [it promotes rollover rate from 49.7 days to 34.9 years assuming 1k/s ops|https://github.com/apache/zookeeper/pull/2164#issuecomment-2368107479]. But I think it is a one-way ticket. And the change of data format may require community wide spread to upgrade third party libraries/tools if they are ever tied to this. Inside ZooKeeper, `accepetedEpoch` and `currentEpoch` are tied to `zxid`. Given a snapshot and a txn log, we need probably deduced those two epoch values to join quorum. So, I presents alternative solution to rollover leader epoch when counter part of zxid reach limit. # Treats last proposal of an epoch as rollover proposal. # Requests from next epoch are proposed normally. # Fences next epoch once rollover proposal persisted. # Proposals from next epoch will not be written to disk before rollover committed. # Leader commits rollover proposal once it get quorum ACKs. # Blocked new epoch proposals are logged once rollover proposal is committed in corresponding nodes. This results in: # No other lead cloud lead using next epoch number once rollover proposal is considered committed. # No proposals from next epoch will be written to disk before rollover proposal is considered committed. Here is the branch, I will draft a pr later. https://github.com/kezhuw/zookeeper/tree/zxid-rollover -- This message was sent by Atlassian Jira (v8.20.10#820010)