Thanks Vinod and James! So I think the task state transition TASK_KILLING -> TASK_FINISHED is a bug, we should change it to TASK_KILLING -> TASK_KILLED.
Regards, Qian Zhang On Fri, Sep 22, 2017 at 3:27 PM, James Peach <jor...@gmail.com> wrote: > > > On Sep 21, 2017, at 10:12 PM, Vinod Kone <vi...@mesosphere.io> wrote: > > > > I think it makes sense for `TASK_KILLED` to be sent in response to a KILL > > call irrespective of the exit status. IIRC, that was the original > intention. > > Those are the semantics we implement and expect in our scheduler and > executor. The only time we emit TASK_KILLED is in response to a scheduler > kill, and a scheduler kill always ends in a TASK_KILLED. > > The rationale for this is > 1. We want to distinguish whether the task finished for its own reasons > (ie. not due to a scheduler kill) > 2. The scheduler told us to kill the task and we did, so it was > TASK_KILLED (irrespective of any exit status) > > > On Thu, Sep 21, 2017 at 8:20 PM, Qian Zhang <zhq527...@gmail.com> wrote: > > > >> Hi Folks, > >> > >> I'd like to collect the feedbacks on the task state TASK_FINISHED. > >> Currently the default and command executor will always send > TASK_FINISHED > >> as long as the exit code of task is 0, this cause an issue: when > scheduler > >> initiates a kill task, the executor will send SIGTERM to the task first, > >> and if the task handles SIGTERM gracefully and exit with 0, the executor > >> will send TASK_FINISHED for that task, so we will see the task state > >> transition: TASK_KILLING -> TASK_FINISHED. > >> > >> This seems incorrect because we thought it should be TASK_KILLING -> > >> TASK_KILLED, that's why we filed a ticket MESOS-7975 > >> <https://issues.apache.org/jira/browse/MESOS-7975> for it. However, I > am > >> not very sure if it is really a bug, because I think it depends on how > we > >> define the meaning of TASK_FINISHED, if it means the task is terminated > >> successfully on its own without external interference, then I think it > does > >> not make sense for scheduler to receive a TASK_KILLING followed by a > >> TASK_FINISHED since there is indeed an external interference (killing > task > >> is initiated by scheduler). However, if TASK_FINISHED means the task is > >> terminated successfully for whatever reason (no matter it is killed or > >> terminated on its own), then I think it is OK to receive a TASK_KILLING > >> followed by a TASK_FINISHED. > >> > >> Please let us know your thoughts on this issue, thanks! > >> > >> > >> Regards, > >> Qian Zhang > >> > >