Hi Olivier, > Can we have "non terminal" errors, from mesos point of view, where task should not be considered as over?
Not really, what you're seeing certainly looks like a bug, terminal updates should be terminal. It'lls probably be hard to debug it without more data ;) As a wild guess, since you seem to be using custom task id's, maybe you tried to start a task twice, and the TASK_ERROR was generated on the master in response to the duplicate task id or some other validation issue, and the TASK_FINISHED was generated on the slave when the first task finished? Although I'm not sure from the top of my head if there are checks in mesos that would catch this. Best regards, On Tue, Sep 19, 2017 at 7:47 AM, Olivier Sallou <olivier.sal...@irisa.fr> wrote: > Hi > I found a strange behaviour on a cluster that I do not understand. I do > not have access to mesos logs (not in my cluster), but anyone faced this > before ? > My framework uses Docker containerizer. We faced a task that sent > TASK_ERROR to the framework (why not), but in reality the Docker executed > correctly on mesos slave, then we received a TASK_FINISHED. > So mesos detected an error with task but it detected anyway the end of the > task sending the finished event at the end. > > How mesos can detect an error but still watching the task and detect its > end ? > > Here are my framework logs: > 2017-09-17 01:06:35,447 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 > is in state TASK_RUNNING > 2017-09-17 01:06:46,286 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 > is in state TASK_ERROR > 2017-09-17 02:13:44,537 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 > is in state TASK_FINISHED > > Unfortunalty I did not log the "reason" of the ERROR, so I do not know > what occured, and cannot at this stage reproduce manually the use case. > > Can we have "non terminal" errors, from mesos point of view, where task > should not be considered as over? > > Thanks > > Olivier > -- Benno Evers Software Engineer, Mesosphere