Status acknowledgements in MesosExecutor

Evers Benno Tue, 03 May 2016 05:50:15 -0700

Hi,

I was wondering about the semantics of the Executor::sendStatusUpdate()
method. It is described as


    // Sends a status update to the framework scheduler, retrying as
    // necessary until an acknowledgement has been received or the
    // executor is terminated (in which case, a TASK_LOST status update
    // will be sent). See Scheduler::statusUpdate for more information
    // about status update acknowledgements.

I was understanding this to say that the function blocks until an
acknowledgement is received, but looking at the implementation of
MesosExecutor it seems that "retrying as necessary" only means
re-sending of unacknowledged updates when the slave reconnects.
(exec/exec.cpp:274)

I'm wondering because we're currently running a python executor which
ends its life like this:

    driver.sendStatusUpdate(_create_task_status(TASK_FINISHED))
    driver.stop()
    # in a different thread:
    sys.exit(0 if driver.run() == mesos_pb2.DRIVER_STOPPED else 1)

and we're seeing situations (roughly once per 10,000 tasks) where the
executor process is reaped before the acknowledgement for TASK_FINISHED
is sent from slave to executor. This results in mesos generating a
TASK_FAILED status update, probably from
Slave::sendExecutorTerminatedStatusUpdate().

So, did I misunderstand how MesosExecutor works? Or is it indeed a race,
and we have to change the executor shutdown?

Best regards,
Benno

Status acknowledgements in MesosExecutor

Reply via email to