becketqin opened a new pull request #13044:
URL: https://github.com/apache/flink/pull/13044


   ## What is the purpose of the change
   This patch fixes the `CheckpointCoordinator` to make it work with 
`ExternallyInducedSource`  in cases that the task snapshots are triggered by 
the external systems via master hooks, rather than the checkpoint coordinator.
   
   The problem in the current code is that when the task snapshots are 
triggered externally via the master hooks, the checkpoint coordinator may 
receive all the acks from the tasks before the master state snapshot completes. 
And this leads to checkpoint failure. The fix is to only finalize the 
checkpoint when all of the operator coordinator checkpoint, master snapshots 
and task snapshots are fully taken.
   
   This patch also fixes the order of component checkpoint by putting the 
`OperatorCoordinator` checkpoint to before the master hooks invocation.
   
   ## Brief change log
   - Finalize the checkpoint when the checkpoint is fully acknowledged, which 
can be triggered from either the acknowledge of task snapshot or master 
snapshot.
   - Adjust the checkpoint order from `(master hooks, OperatorCoordinators) -> 
tasks` to `OperatorCoordinators -> master hooks -> tasks`.
   
   ## Verifying this change
   This change added tests and can be verified by running 
`CheckpointCoordinatorTest#testTaskCheckpointTriggeredByMasterHooks()`.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to