[ 
https://issues.apache.org/jira/browse/MESOS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-770:
-------------------------

    Description: 
When the log is catching up either in the process of recovering or after 
coordinator failover the Paxos protocol is run on multiple positions (possibly 
the entire log).

Currently the catch-up process is linear (one thread fills positions 
one-by-one). What's preventing us from catching up all positions concurrently 
is that too much concurrency could have negative impact on the network and the 
problem may be exacerbated by the contention between multiple recovering 
replicas and the coordinator.

Rate control helps limit the number of concurrent positions a proposer 
(recoverer or coordinator) seeks consensus at a time. We can batch a number of 
positions each time.

Randomly picking the positions in each batch reduces the possibility that 
multiple proposers contend for the same position at the same time which causes 
conflict and retries.

  was:
When the log is catching up either in the process of recovering or after 
coordinator failover the Paxos protocol is run on multiple positions (possibly 
the entire log) concurrently. Too much concurrency could have negative impact 
on the network and the problem may be exacerbated by the contention among 
between multiple recovering replicas and the coordinator.

Rate control helps limit the number of concurrent positions a proposer 
(recoverer or coordinator) seeks consensus at a time. We can batch a number of 
positions each time.

Randomly picking the positions in each batch reduces the possibility that 
multiple proposers contend for the same position at the same time which causes 
conflict and retries.


> Rate control and randomization of Replicated Log catching-up
> ------------------------------------------------------------
>
>                 Key: MESOS-770
>                 URL: https://issues.apache.org/jira/browse/MESOS-770
>             Project: Mesos
>          Issue Type: Improvement
>          Components: replicated log
>            Reporter: Yan Xu
>
> When the log is catching up either in the process of recovering or after 
> coordinator failover the Paxos protocol is run on multiple positions 
> (possibly the entire log).
> Currently the catch-up process is linear (one thread fills positions 
> one-by-one). What's preventing us from catching up all positions concurrently 
> is that too much concurrency could have negative impact on the network and 
> the problem may be exacerbated by the contention between multiple recovering 
> replicas and the coordinator.
> Rate control helps limit the number of concurrent positions a proposer 
> (recoverer or coordinator) seeks consensus at a time. We can batch a number 
> of positions each time.
> Randomly picking the positions in each batch reduces the possibility that 
> multiple proposers contend for the same position at the same time which 
> causes conflict and retries.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to