[ 
https://issues.apache.org/jira/browse/MESOS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-770:
------------------------------------
    Labels: mesosphere  (was: )

> Rate control and randomization of Replicated Log catching-up
> ------------------------------------------------------------
>
>                 Key: MESOS-770
>                 URL: https://issues.apache.org/jira/browse/MESOS-770
>             Project: Mesos
>          Issue Type: Improvement
>          Components: replicated log
>            Reporter: Yan Xu
>              Labels: mesosphere
>
> When the log is catching up either in the process of recovering or after 
> coordinator failover the Paxos protocol is run on multiple positions 
> (possibly the entire log).
> Currently the catch-up process is linear (one thread fills positions 
> one-by-one). What's preventing us from catching up all positions concurrently 
> is that too much concurrency could have negative impact on the network and 
> the problem may be exacerbated by the contention between multiple recovering 
> replicas and the coordinator.
> Rate control helps limit the number of concurrent positions a proposer 
> (recoverer or coordinator) seeks consensus at a time. We can batch a number 
> of positions each time.
> Randomly picking the positions in each batch reduces the possibility that 
> multiple proposers contend for the same position at the same time which 
> causes conflict and retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to