[
https://issues.apache.org/jira/browse/ZOOKEEPER-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ZOOKEEPER-3657:
--------------------------------------
Labels: pull-request-available (was: )
> Implementing snapshot schedule to avoid high latency issue due to disk
> contention
> ----------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-3657
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3657
> Project: ZooKeeper
> Issue Type: New Feature
> Components: server
> Reporter: Fangmin Lv
> Assignee: Fangmin Lv
> Priority: Major
> Labels: pull-request-available
>
> If ZK server is running a machine with single disk driver, the snapshot and
> txn fsync thread will have disk IO contention (even on SSD). Majority taking
> snapshot will affect the txn fsync time, and hence the end to end update and
> read latency.
> To provide better SLA guarantee and improve the write throughput with large
> snapshot (> 3GB), the snapshot scheduler is implemented internally to avoid
> majority taking snapshot at the same time, which provides better latency
> guarantee.
> A new quorum packet type SNAPPING is introduced in this feature, leader will
> send this packet to the followers periodically like PING but less frequently.
> Followers will send the current status back, like the maximum txns since last
> snapshot, fsync latency, etc, and leader will decide who should take snapshot.
> On follower, it will enable safe snapshot mode if leader is sending SNAPPING,
> which will only take snapshot if the txns is much larger than the threshold
> we defined for SyncRequestProcessor, this is used to avoid issues like the
> follower accumulated too many txns before it is scheduled to take snapshot.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)