Fangmin Lv created ZOOKEEPER-3500:
-------------------------------------
Summary: Improving the ZAB UPTODATE semantic to only issue it to
learner when there is limited lagging
Key: ZOOKEEPER-3500
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3500
Project: ZooKeeper
Issue Type: Improvement
Components: server
Reporter: Fangmin Lv
Assignee: Fangmin Lv
With large snapshot and high write RPS, when learner is having SNAP syncing
with leader, there will be lots of txns need to be replayed between NEWLEADER
and UPTODATE packet.
Depends how big the snapshot and traffic is, from our benchmark, it may take
more than 30s to replay all those txns, which means when we process the
UPTODATE packet, it's still 30s lagging behind, with 10K/s txn that's 300K txns
lagging.
And we start to serve client traffic just after we received UPTODATE packet,
which means client will see lots of stale data.
The idea here is trying to check and only send UPTODATE packet when there is
limited txns lagging behind from leader side. It doesn't change the ZAB
protocol, but changed the time when ZK is applying the txns between NEWLEADER
and UPTODATE.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)