Hi All, I am looking to verify if my understanding of Task failures and executor failures in Mesos is correct.
I am assuming the following * Mesos trusts custom executor to report task status. If a task completes/fails, but executor does not call ExecutorDriver.sendStatusUpdate() with TASK_COMPLETE/TASK_FAILED then Mesos will assume that the task is still running. * Mesos does not use task status sent using call to ExecutorDriver. sendStatusUpdate as a heartbeat. For E.g. in MyriadExecutor we report the NMTask status as TASK_RUNNING after launching the NM. We report TASK_COMPLETE/TASK_FAILED only after the process has terminated. There is no call to ExecutorDriver.sendStatusUpdate() in between. I am assuming that this does not cause Mesos to think that the task has been lost after some timeout interval. * If an executor dies, Mesos thinks all tasks launched by that executor are lost. Scheduler will receive one call to executorLost() and statusUpdate()'s with state set to TASK_LOST for every Task launched by that executor. Please let me know if any of my assumptions are incorrect. Regards Swapnil
