[
https://issues.apache.org/jira/browse/MESOS-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gastón Kleiman reassigned MESOS-9573:
-------------------------------------
Assignee: Gastón Kleiman
https://reviews.apache.org/r/69977/diff/1#index_header
> Agent should not try to recover operation status update streams that haven't
> been created yet.
> ----------------------------------------------------------------------------------------------
>
> Key: MESOS-9573
> URL: https://issues.apache.org/jira/browse/MESOS-9573
> Project: Mesos
> Issue Type: Bug
> Components: agent
> Reporter: Gastón Kleiman
> Assignee: Gastón Kleiman
> Priority: Major
> Labels: foundations, mesosphere
>
> If the agent fails over after having checkpointed a new operation but before
> the operation status update stream is created, the recovery process will fail.
> This happens because agent will try to recover the operation status update
> streams even if it hasn't been created yet.
> In order to prevent recovery failures, the agent should obtain the ids of the
> streams to recover by walking the directory in which operation status updates
> streams are stored.
> The agent should also garbage collect streams if the checkpointed state
> doesn't contain a corresponding operation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)