[
https://issues.apache.org/jira/browse/MESOS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Artem Harutyunyan updated MESOS-770:
------------------------------------
Labels: mesosphere (was: )
> Rate control and randomization of Replicated Log catching-up
> ------------------------------------------------------------
>
> Key: MESOS-770
> URL: https://issues.apache.org/jira/browse/MESOS-770
> Project: Mesos
> Issue Type: Improvement
> Components: replicated log
> Reporter: Yan Xu
> Labels: mesosphere
>
> When the log is catching up either in the process of recovering or after
> coordinator failover the Paxos protocol is run on multiple positions
> (possibly the entire log).
> Currently the catch-up process is linear (one thread fills positions
> one-by-one). What's preventing us from catching up all positions concurrently
> is that too much concurrency could have negative impact on the network and
> the problem may be exacerbated by the contention between multiple recovering
> replicas and the coordinator.
> Rate control helps limit the number of concurrent positions a proposer
> (recoverer or coordinator) seeks consensus at a time. We can batch a number
> of positions each time.
> Randomly picking the positions in each batch reduces the possibility that
> multiple proposers contend for the same position at the same time which
> causes conflict and retries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)