gaoyunhaii opened a new pull request #16977:
URL: https://github.com/apache/flink/pull/16977
## What is the purpose of the change
This PR tries to fix the issue that for `CheckpointBarrierAligner`, if a
barrier is faked due to received checkpoint trigger from RPC but the channel is
not finished yet, we should not call resuming channels. If we still resume the
channels, since the upstream subpartition is not blocked, there will be
exceptions.
The current implementation distinguish if barrier is received from RPC when
processing the barrier. Currently I do come up with other better options.
## Brief change log
- 374815385e0790d86e33198354c326d0955d3f38 Modifies the
`BarrierHandlerState` implementations so that it support skipping marking
channels blocked, then we would not resume the skipped channels on aligned.
- 81b5dd80850a084c70a04a97c3b2879d7fcf5e98 Modifies `processBarrier` to pass
the information of whether the barrier is triggered via RPC.
- 1c2ab01420ed2e4d88a74522f0bb611f6d53b751 enhances the test for this case.
## Verifying this change
This change is already covered by the enhanced test.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): **no**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **no**
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **no**
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no**
- The S3 file system connector: **no**
## Documentation
- Does this pull request introduce a new feature? **no**
- If yes, how is the feature documented? **not applicable**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]