Yan Xu created MESOS-7832:
-----------------------------

             Summary: Mesos master during failover may not re-add completed 
tasks from agents belonging to frameworks that have yet to reregister
                 Key: MESOS-7832
                 URL: https://issues.apache.org/jira/browse/MESOS-7832
             Project: Mesos
          Issue Type: Bug
            Reporter: Yan Xu


Relevant code: 
https://github.pie.apple.com/pie/mesos/blob/cd3380c4e9521b4b26f9030658816eee7a4b89a1/src/master/master.cpp#L8611-L8617

Info about these completed tasks is discarded and later when the framework 
subscribes, the tasks are not recovered.

It's not ideal that after a master failover, the new master doesn't recover all 
info that the previous master possesses and the webUI looks weird with missing 
info.

In the short term we can store the info for such tasks temporarily but delete 
it after a timeout if the related frameworks don't reregister.

Of course after we persist the frameworks in the registry the master will have 
better knowledge on whether a framework is completed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to