Jiangjie Qin created FLINK-20222:
------------------------------------

             Summary: The CheckpointCoordinator should reset the 
OperatorCoordinators when fail before the first checkpoint.
                 Key: FLINK-20222
                 URL: https://issues.apache.org/jira/browse/FLINK-20222
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
            Reporter: Jiangjie Qin


Right now, if a job failed before the first successful checkpoint, the 
CheckpointCoordinator will not reset the OperatorCoordinator state. This may 
leave the OperatorCoordinators in inconsistent state.

The CheckpointCoordinator should also reset the OperatorCoordinator state in 
this case, just like it does for the master hooks. It essentially means "reset 
to no checkpoint". There are two options for the fix:
 # Add a reset() method to the OperatorCoordinator.
 # Call resetToCheckpoint(null) on the OperatorCoordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to