Fangmin Lv created ZOOKEEPER-3618:
-------------------------------------
Summary: Send batch quorum Ack and Commit packets to improve the
efficiency and throughput of Zeus
Key: ZOOKEEPER-3618
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3618
Project: ZooKeeper
Issue Type: Improvement
Components: server
Affects Versions: 3.6.0
Reporter: Fangmin Lv
Assignee: Fangmin Lv
Fix For: 3.6.0
ZK guarantees that the txns will be flushed to disk in order, and we're doing
batch flush to improve the disk IO efficiency and throughput, but when sending
ACK back its still sending one by one, which is not efficient, instead we can
send the ACK for the last flushed txn to leader in batch mode.
On leader, when it's receiving the ACK for txn N, based on the flushing order
guarantees, all the txns before N have been flushed to disk as well, so they're
all ACKed. The leader can then maintain the (SID -> last ACKed ZXID) map to
calculate the latest COMMIT ZXID, and send that to all learners.
Based on the ordering guarantee, when learner received COMMIT for txn N, it
means all the txns before that have been committed.
The main benefit we can get from this feature is to reduce the memory pressure,
GC, quorum communication effort on all servers, and reduce the lock contention
on leader when processing ACK, Commit, etc.
Overall, this will improve the efficiency of ZK, and expect to support higher
throughput for write traffic.
To main challenge of this work is making sure backward compatible and also safe
for gradually rollout, meanwhile make sure it won't affect the
correctness/durability for txns during dynamic reconfig.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)