> On June 17, 2014, 12:28 a.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 1219-1224 > > <https://reviews.apache.org/r/22313/diff/7/?file=607037#file607037line1219> > > > > Is this possible? AFAICT, since the task was added to the executor the > > executor shouldn't be removed between _runTask() and __runTask(). Even if > > the executor terminates in between, this task should've been marked > > 'terminated' but not 'completed' (i.e., waiting for an ACK) and hence the > > executor won't be removed from the framework's map. Since there is a > > pending executor, the framework shouldn't be removed. > > > > So this can be a CHECK_NONTULL(framework) with a comment on why it can > > be a check. > > Yifan Gu wrote: > Good point! I am currently trying to add a test to kill the framework > before the containerizer->update() returns to test this. Thanks for pointing > out! > > Yifan Gu wrote: > Hi Vinod, I found that the framework does have a chance to be NULL here. > Seems that since the executor is not in framework->pending() at this time (it > is removed from the pending queue at the beginning of _runTask()), so the > executor can be removed. > > These shutdown executor/framework logic is really not easy to tell from a > single glance, so I have done an experiment. > > I have uploaded a test and logs, it shows that the framework can be > removed before __runTask() is called. > I really hope you could take a look to see if I missed some stuff. Thank > you! > > I think maybe I can add the task to the pending queue again before > calling the containerizer->update().
You are right. The executor/framework will not be removed in the normal course of things if the executor has a pending task. But they will be removed if the framework/executor was explicitly asked to shutdown by the master. This is the case in your test. So yes, keep the 'if' block instead of CHECK. - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22313/#review45858 ----------------------------------------------------------- On June 19, 2014, 1:30 a.m., Yifan Gu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22313/ > ----------------------------------------------------------- > > (Updated June 19, 2014, 1:30 a.m.) > > > Review request for mesos, Ian Downes and Vinod Kone. > > > Bugs: MESOS-886 > https://issues.apache.org/jira/browse/MESOS-886 > > > Repository: mesos-git > > > Description > ------- > > Added __runTask() to wait for the completion of containerizer->update() and > check the result before sending RunTaskMessage. > > > Diffs > ----- > > src/slave/slave.hpp 34687e5 > src/slave/slave.cpp 643c088 > src/tests/slave_tests.cpp 2c8f183 > > Diff: https://reviews.apache.org/r/22313/diff/ > > > Testing > ------- > > SlaveTest, CancelTaskIfContainerizerFails > > Which tests that if the containerizer->update() return a Failure, the task > won't be launched and the scheduler will get TASK_LOST. > > make check > > > File Attachments > ---------------- > > framework will exit > > https://reviews.apache.org/media/uploaded/files/2014/06/18/fbe73273-7aa9-4faa-b1c5-003ab03042a9__issue-886.diff > log > > https://reviews.apache.org/media/uploaded/files/2014/06/18/84d801a0-5c2a-4bb9-901b-e1962031461c__log > > > Thanks, > > Yifan Gu > >