[
https://issues.apache.org/jira/browse/AURORA-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533622#comment-14533622
]
Maxim Khutornenko commented on AURORA-1047:
-------------------------------------------
h2. Change Summary
h3. Definitions
_Implicit reconciliation_ - scheduler request for status updates of *all*
non-terminal tasks known to Mesos. May include status updates for tasks unknown
to scheduler. Achieved by calling \[1\] with empty collection arg.
_Explicit reconciliation_ - scheduler request for status updates of *all*
non-terminal tasks known to scheduler. May include status updates for tasks
unknown to Mesos. Achieved by calling \[1\] with non-empty collection arg.
h3. Motivation
Due to distributed state between scheduler and Mesos master, it is possible to
have task state drift under certain failure conditions (scheduler failover,
master failover, slave unresponsive, network partition end etc.). It is vital
to have a reconciliation process in place that helps keeping task states in
check thus avoiding rogue duplicate instances and resource waste. More on this
here: \[2\].
Given the current Mesos reconciliation API, it isn't possible to reconcile
global state via a single "diff" call (given the sheer amount of data to be
processed on either side). Instead, a combination of explicit and implicit
reconciliation approaches should achieve a global state sync. Below is the
table summarizing task state drifts addressed by either approach:
||Mesos v / Scheduler > || non-terminal || terminal || absent ||
| non-terminal | X | implicit | implicit |
| terminal | explicit | X | X
|
| absent | explicit | X | X
|
The new reconciliation algorithm intends to replace gc executors \[3\], which
require Mesos offered resources to run on every slave.
h3. Implementation
Aurora scheduler will have two new background threads to continuously run
explicit and implicit task reconciliations on a non-overlapping schedule. The
initial default schedules will be set to hourly runs with 30 minute spread. The
assumption is that either reconciliation should not take longer than 30 minutes
to process status updates for all tasks known to scheduler/mesos. The feature
will be initially delivered as optional and secondary to the existing GC
executor reconciliation (though not intended to run in parallel with GC
executors). It will become primary once the GC executor code is removed.
\[1\] -
https://github.com/apache/mesos/blob/2985ae05634038b70f974bbfed6b52fe47231418/src/java/src/org/apache/mesos/SchedulerDriver.java#L277
\[2\] - http://mesos.apache.org/documentation/latest/reconciliation/
\[3\] -
https://github.com/apache/aurora/blob/ef0975655c04f0c2f3ecb6599d4e4beb9547f091/src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> Implement state reconciliation within the scheduler
> ---------------------------------------------------
>
> Key: AURORA-1047
> URL: https://issues.apache.org/jira/browse/AURORA-1047
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Reporter: brian wickman
> Assignee: Maxim Khutornenko
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)