Gastón Kleiman created MESOS-9573:
-------------------------------------

             Summary: Agent should not try to recover operation status update 
streams that haven't been created yet.
                 Key: MESOS-9573
                 URL: https://issues.apache.org/jira/browse/MESOS-9573
             Project: Mesos
          Issue Type: Bug
          Components: agent
            Reporter: Gastón Kleiman


If the agent fails over after having checkpointed a new operation but before 
the operation status update stream is created, the recovery process will fail.

This happens because agent will try to recover the operation status update 
streams even if it hasn't been created yet.

In order to prevent recovery failures, the agent should obtain the ids of the 
streams to recover by walking the directory in which operation status updates 
streams are stored.

The agent should also garbage collect streams if the checkpointed state doesn't 
contain a corresponding operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to