becketqin opened a new pull request #13044:
URL: https://github.com/apache/flink/pull/13044
## What is the purpose of the change
This patch fixes the `CheckpointCoordinator` to make it work with
`ExternallyInducedSource` in cases that the task snapshots are triggered by
the external systems via master hooks, rather than the checkpoint coordinator.
The problem in the current code is that when the task snapshots are
triggered externally via the master hooks, the checkpoint coordinator may
receive all the acks from the tasks before the master state snapshot completes.
And this leads to checkpoint failure. The fix is to only finalize the
checkpoint when all of the operator coordinator checkpoint, master snapshots
and task snapshots are fully taken.
This patch also fixes the order of component checkpoint by putting the
`OperatorCoordinator` checkpoint to before the master hooks invocation.
## Brief change log
- Finalize the checkpoint when the checkpoint is fully acknowledged, which
can be triggered from either the acknowledge of task snapshot or master
snapshot.
- Adjust the checkpoint order from `(master hooks, OperatorCoordinators) ->
tasks` to `OperatorCoordinators -> master hooks -> tasks`.
## Verifying this change
This change added tests and can be verified by running
`CheckpointCoordinatorTest#testTaskCheckpointTriggeredByMasterHooks()`.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]