[jira] [Commented] (ZOOKEEPER-3657) Implementing snapshot schedule to avoid high latency issue due to disk contention

Fangmin Lv (Jira) Thu, 19 Dec 2019 19:14:20 -0800


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000575#comment-17000575
 ]


Fangmin Lv commented on ZOOKEEPER-3657:
---------------------------------------

 [~eolivelli] I've removed the 3.6.0 tag, thanks for remind.

>  Implementing snapshot schedule to avoid high latency issue due to disk 
> contention
> ----------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3657
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3657
>             Project: ZooKeeper
>          Issue Type: New Feature
>          Components: server
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Major
>
> If ZK server is running a machine with single disk driver, the snapshot and 
> txn fsync thread will have disk IO contention (even on SSD). Majority taking 
> snapshot will affect the txn fsync time, and hence the end to end update and 
> read latency.
> To provide better SLA guarantee and improve the write throughput with large 
> snapshot (> 3GB), the snapshot scheduler is implemented internally to avoid 
> majority taking snapshot at the same time, which provides better latency 
> guarantee.
> A new quorum packet type SNAPPING is introduced in this feature, leader will 
> send this packet to the followers periodically like PING but less frequently. 
> Followers will send the current status back, like the maximum txns since last 
> snapshot, fsync latency, etc, and leader will decide who should take snapshot.
> On follower, it will enable safe snapshot mode if leader is sending SNAPPING, 
> which will only take snapshot if the txns is much larger than the threshold 
> we defined for SyncRequestProcessor, this is used to avoid issues like the 
> follower accumulated too many txns before it is scheduled to take snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ZOOKEEPER-3657) Implementing snapshot schedule to avoid high latency issue due to disk contention

Reply via email to