Hi David,

You are right in that all these status updates are what we call "terminal"
status updates and mesos takes specific actions when it gets/generates one
of these.

TASK_LOST is special in the sense that is not generated by the executor,
but by the slave/master. You could think of it as an exception in mesos.
Clearly, these should be rare in a stable mesos system.

What do your logs say about the TASK_LOSTs? Is it always the same issue?
Are you running w/ cgroups?



On Fri, May 17, 2013 at 2:04 PM, David Greenberg <[email protected]>wrote:

> Hello! Today I began working on a more advanced version of mesos-submit
> that will handle hot-spares.
>
> I was assuming that TASK_{FAILED,FINISHED,LOST,KILLED} were the status
> updates that meant that I needed to start a new spare process, as the
> monitored task was killed. However, I noticed that I often recieved
> TASK_LOSTs, and every 5 seconds, my scheduler would think its tasks had all
> died, so it'd restart too many. Nevertheless, the tasks would reappear
> later on, and I could see them in the web interface of Mesos, continuing to
> run.
>
> What is going on?
>
> Thanks!
> David
>

Reply via email to