Vinod Kone created MESOS-646:
--------------------------------

             Summary: Slave recovery doesn't properly handle checkpointed 
queued tasks
                 Key: MESOS-646
                 URL: https://issues.apache.org/jira/browse/MESOS-646
             Project: Mesos
          Issue Type: Bug
            Reporter: Vinod Kone
            Assignee: Vinod Kone
             Fix For: 0.14.0


If the slave dies after checkpointing a queued task but before it was launched 
on an executor, the slave doesn't have enough information to relaunch it 
(because we only checkpoint Task instead of TaskInfo).
When the executor re-registers it should simply remove these tasks from its 
map. 

Alternatively, slave could checkpoint TaskInfo instead of Task. We don't do 
this because TaskInfo.data could be potentially huge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to