[ 
https://issues.apache.org/jira/browse/AURORA-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533622#comment-14533622
 ] 

Maxim Khutornenko commented on AURORA-1047:
-------------------------------------------

h2. Change Summary
h3. Definitions
_Implicit reconciliation_ - scheduler request for status updates of *all* 
non-terminal tasks known to Mesos. May include status updates for tasks unknown 
to scheduler. Achieved by calling \[1\] with empty collection arg.
_Explicit reconciliation_ - scheduler request for status updates of *all* 
non-terminal tasks known to scheduler. May include status updates for tasks 
unknown to Mesos. Achieved by calling \[1\] with non-empty collection arg.

h3. Motivation
Due to distributed state between scheduler and Mesos master, it is possible to 
have task state drift under certain failure conditions (scheduler failover, 
master failover, slave unresponsive, network partition end etc.). It is vital 
to have a reconciliation process in place that helps keeping task states in 
check thus avoiding rogue duplicate instances and resource waste. More on this 
here: \[2\].

Given the current Mesos reconciliation API, it isn't possible to reconcile 
global state via a single "diff" call (given the sheer amount of data to be 
processed on either side). Instead, a combination of explicit and implicit 
reconciliation approaches should achieve a global state sync. Below is the 
table summarizing task state drifts addressed by either approach:
||Mesos v / Scheduler > || non-terminal || terminal || absent ||
|        non-terminal          |          X          |   implicit  | implicit  |
|           terminal              |     explicit      |       X       |      X  
    |
|            absent               |     explicit      |       X       |      X  
    |

The new reconciliation algorithm intends to replace gc executors \[3\], which 
require Mesos offered resources to run on every slave.

h3. Implementation
Aurora scheduler will have two new background threads to continuously run 
explicit and implicit task reconciliations on a non-overlapping schedule. The 
initial default schedules will be set to hourly runs with 30 minute spread. The 
assumption is that either reconciliation should not take longer than 30 minutes 
to process status updates for all tasks known to scheduler/mesos. The feature 
will be initially delivered as optional and secondary to the existing GC 
executor reconciliation (though not intended to run in parallel with GC 
executors). It will become primary once the GC executor code is removed.

\[1\] - 
https://github.com/apache/mesos/blob/2985ae05634038b70f974bbfed6b52fe47231418/src/java/src/org/apache/mesos/SchedulerDriver.java#L277
\[2\] - http://mesos.apache.org/documentation/latest/reconciliation/
\[3\] - 
https://github.com/apache/aurora/blob/ef0975655c04f0c2f3ecb6599d4e4beb9547f091/src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java

> Implement state reconciliation within the scheduler
> ---------------------------------------------------
>
>                 Key: AURORA-1047
>                 URL: https://issues.apache.org/jira/browse/AURORA-1047
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: brian wickman
>            Assignee: Maxim Khutornenko
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to