[ 
https://issues.apache.org/jira/browse/MESOS-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman reassigned MESOS-9573:
-------------------------------------

    Assignee: Gastón Kleiman

https://reviews.apache.org/r/69977/diff/1#index_header

> Agent should not try to recover operation status update streams that haven't 
> been created yet.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9573
>                 URL: https://issues.apache.org/jira/browse/MESOS-9573
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>            Reporter: Gastón Kleiman
>            Assignee: Gastón Kleiman
>            Priority: Major
>              Labels: foundations, mesosphere
>
> If the agent fails over after having checkpointed a new operation but before 
> the operation status update stream is created, the recovery process will fail.
> This happens because agent will try to recover the operation status update 
> streams even if it hasn't been created yet.
> In order to prevent recovery failures, the agent should obtain the ids of the 
> streams to recover by walking the directory in which operation status updates 
> streams are stored.
> The agent should also garbage collect streams if the checkpointed state 
> doesn't contain a corresponding operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to