[jira] [Updated] (FLINK-20222) The CheckpointCoordinator should reset the OperatorCoordinators when fail before the first checkpoint.

Robert Metzger (Jira) Wed, 18 Nov 2020 05:47:04 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-20222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Metzger updated FLINK-20222:
-----------------------------------
    Fix Version/s: 1.12.0

> The CheckpointCoordinator should reset the OperatorCoordinators when fail 
> before the first checkpoint.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-20222
>                 URL: https://issues.apache.org/jira/browse/FLINK-20222
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>            Reporter: Jiangjie Qin
>            Assignee: Stephan Ewen
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> Right now, if a job failed before the first successful checkpoint, the 
> CheckpointCoordinator will not reset the OperatorCoordinator state. This may 
> leave the OperatorCoordinators in inconsistent state.
> The CheckpointCoordinator should also reset the OperatorCoordinator state in 
> this case, just like it does for the master hooks. It essentially means 
> "reset to no checkpoint". There are two options for the fix:
>  # Add a reset() method to the OperatorCoordinator.
>  # Call resetToCheckpoint(null) on the OperatorCoordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-20222) The CheckpointCoordinator should reset the OperatorCoordinators when fail before the first checkpoint.

Reply via email to