Yan Xu created MESOS-7832:
-----------------------------
Summary: Mesos master during failover may not re-add completed
tasks from agents belonging to frameworks that have yet to reregister
Key: MESOS-7832
URL: https://issues.apache.org/jira/browse/MESOS-7832
Project: Mesos
Issue Type: Bug
Reporter: Yan Xu
Relevant code:
https://github.pie.apple.com/pie/mesos/blob/cd3380c4e9521b4b26f9030658816eee7a4b89a1/src/master/master.cpp#L8611-L8617
Info about these completed tasks is discarded and later when the framework
subscribes, the tasks are not recovered.
It's not ideal that after a master failover, the new master doesn't recover all
info that the previous master possesses and the webUI looks weird with missing
info.
In the short term we can store the info for such tasks temporarily but delete
it after a timeout if the related frameworks don't reregister.
Of course after we persist the frameworks in the registry the master will have
better knowledge on whether a framework is completed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)