Bernd Mathiske created MESOS-2068:
-------------------------------------

             Summary: Add comments that explain framework, executor ID, and 
task life cycle in slave
                 Key: MESOS-2068
                 URL: https://issues.apache.org/jira/browse/MESOS-2068
             Project: Mesos
          Issue Type: Improvement
          Components: slave
            Reporter: Bernd Mathiske
            Assignee: Bernd Mathiske
            Priority: Minor


Fixing MESOS-947 was relatively difficult because the source code is mostly the 
only source of information with regard to the life cycle of frameworks, 
executors, and tasks in the slave. In particular this leads to confusion about 
whether there could be a task lost state  at the beginning of _runTask() when 
the framework is NULL. This shall be explained to the best of the assignees 
knowledge.

For context see https://reviews.apache.org/r/27567
with these comments:

On Nov. 5, 2014, 7:50 p.m., Ben Mahler wrote:
src/slave/slave.cpp, lines 1195-1200
<https://reviews.apache.org/r/27567/diff/1/?file=748326#file748326line1195>

   A comment here as to why we don't need to send TASK_LOST would be much 
appreciated! It's not obvious so someone might come along and add a TASK_LOST 
to make sure we're not dropping the task on the floor, so context here would be 
great!

Bernd Mathiske wrote:
   Hah, thanks for sharing - I am not alone! :-) None of this was obvious to me 
either, because there is no comment explaining the general life cycle of 
anything. Once you understand the intended life cycle, there is now way there 
can be a TASK_LOST situation here, though. Therefore I propose adding comments 
describing the overall picture regarding frameworks, executor IDs and task 
creation in the appropriate places, instead. I'll file a ticket if you agree.

Once you understand the intended life cycle, there is now way there can be a 
TASK_LOST situation here, though.

Phew! :)

Could you distill your learnings into a comment here, and maybe make the log 
message more informative? Even with an overall description as you mentioned, 
dummies like me would still get confused here given the lack of _local_ 
context. ;)

- Ben




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to