Hi All,

I am looking to verify if my understanding of Task failures and executor
failures in Mesos is correct.

I am assuming the following

* Mesos trusts custom executor to report task status.
  If a task completes/fails, but executor does not call
 ExecutorDriver.sendStatusUpdate() with TASK_COMPLETE/TASK_FAILED then
Mesos will assume that the task is still running.

* Mesos does not use task status sent using call to ExecutorDriver.
sendStatusUpdate as a heartbeat.
  For E.g. in MyriadExecutor we report the NMTask status as TASK_RUNNING
after launching the
 NM. We report TASK_COMPLETE/TASK_FAILED only after the process has
terminated. There is no call to ExecutorDriver.sendStatusUpdate() in
between. I am assuming that this does not cause Mesos to think that the
task has been lost after some timeout interval.

* If an executor dies, Mesos thinks all tasks launched by that executor are
lost. Scheduler will receive one call to executorLost() and
statusUpdate()'s with state set to TASK_LOST for every Task launched by
that executor.

Please let me know if any of my assumptions are incorrect.

Regards
Swapnil

Reply via email to