Re: Review Request 23912: Fix MESOS-947: Slave should properly handle a killTask() that arrives between runTask() and _runTask()

Vinod Kone Tue, 05 Aug 2014 14:30:12 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23912/#review49552
-----------------------------------------------------------




src/slave/slave.hpp
<https://reviews.apache.org/r/23912/#comment86682>

    Can you add a comment here saying you made these virtual so that you can 
mock the Slave?



src/slave/slave.cpp
<https://reviews.apache.org/r/23912/#comment86683>

    thank you.



src/slave/slave.cpp
<https://reviews.apache.org/r/23912/#comment86882>

    I would recommend pulling this logic outside 'if (executor == NULL)' to 
#1326 to make it easy to reason about.
    
    also, task ids are globally unique. there shouldn't be multiple executors 
with the same task id. you could actually do a CHECK(executorIds.size() == 1) 
to confirm that.



src/slave/slave.cpp
<https://reviews.apache.org/r/23912/#comment86884>

    s/executor is running/it was launched/



src/slave/slave.cpp
<https://reviews.apache.org/r/23912/#comment86883>

    s/Task killed early/Task killed before it was launched/



src/tests/mesos.hpp
<https://reviews.apache.org/r/23912/#comment86873>

    



src/tests/mesos.hpp
<https://reviews.apache.org/r/23912/#comment86874>

    



src/tests/mesos.hpp
<https://reviews.apache.org/r/23912/#comment86875>

    



src/tests/mesos.cpp
<https://reviews.apache.org/r/23912/#comment86860>

    end with a period.



src/tests/mesos.cpp
<https://reviews.apache.org/r/23912/#comment86861>

    were you unable to directly call Slave::* methods here instead of having 
the redirection through unmocked_* methods?



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86863>

    unused?



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86864>

    To simplify this boiler plate you could use the LaunchTasks action. See 
TerminatingSlaveDoesNotReregister test in this file for an example.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86871>

    Kill these expectations since an executor won't be launched in this test.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86867>

    s/Slave::_runTask/'Slave::_runTask()'/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86868>

    s/Slave::_runTask/'Slave::_runTask()'/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86876>

    s/Slave::runTask/'Slave::runTask()'/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86877>

    s/Slave::killTask/'Slave::killTask()'/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86878>

    s/MockedSlave::unmocked__runTask/'MockedSlave::unmocked__runTask()'/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment86872>

    kill this.


- Vinod Kone


On July 28, 2014, 8:24 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23912/
> -----------------------------------------------------------
> 
> (Updated July 28, 2014, 8:24 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Bugs: MESOS-947
>     https://issues.apache.org/jira/browse/MESOS-947
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Fixes MESOS-947 "Slave should properly handle a killTask() that arrives 
> between runTask() and _runTask()".
> 
> Slave::killTask() did not check for task in question combination to be 
> "pending" (i.e. Slave::runTask had happened, but Slave::_runTask had not yet) 
> and then erroneously assumed that Slave::runTask() had not been executed. The 
> task was then marked "LOST" instead of "KILLED". But Slave::runTask had 
> already scheduled Slave::_runTask to follow. Now the entry for being 
> "pending" is removed, and the task is marked "KILLED", and _runTask gets 
> informed about this. It checks whether the task in question is currently 
> "pending" and if it is not, then it infers that the task has been killed and 
> does not erroneously try to complete launching it.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp a896bb66db5d8cd27ef02b6498c9db93cb0d525f 
>   src/slave/slave.cpp 1d5691836822c8587e1aa8ed24860a8012c67a6e 
>   src/tests/mesos.hpp 75c66fda2485afa0d4541e710780d90b3411839a 
>   src/tests/mesos.cpp 35c94fa908ad728ea92a7d1bfcbe90d57b1b83d9 
>   src/tests/slave_tests.cpp e45255a6f699e51bf09397da95a5a11edbabe591 
> 
> Diff: https://reviews.apache.org/r/23912/diff/
> 
> 
> Testing
> -------
> 
> Wrote a unit test that reliably created the situation described in the 
> ticket. Observed that TASK_LOST and the listed log output occurred. This 
> pointed directly to the lines in killTask() where the problem is rooted. Ran 
> the test after fixing, it succeeded. Checked the log. It looks like a "clean 
> kill" now :-)
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>

Re: Review Request 23912: Fix MESOS-947: Slave should properly handle a killTask() that arrives between runTask() and _runTask()

Reply via email to