fanrui created FLINK-28474:
------------------------------
Summary: ChannelStateWriteResult may not fail after checkpoint
abort
Key: FLINK-28474
URL: https://issues.apache.org/jira/browse/FLINK-28474
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing
Affects Versions: 1.15.1, 1.14.5
Reporter: fanrui
Fix For: 1.16.0, 1.15.2, 1.14.6
Attachments: image-2022-07-09-22-21-24-417.png
After Checkpoint abort, ChannelStateWriteResult should fail.
But if _channelStateWriter.start(id, checkpointOptions);_ is executed after
Checkpoint abort, ChannelStateWriteResult will not fail.
h2. Cause Analysis:
When abort checkpoint, channelStateWriter.start(id, checkpointOptions); may not
be executed yet. These checkpointIds will be stored in the abortedCheckpointIds
of SubtaskCheckpointCoordinatorImpl, and when checkpointState is called, it
will check if the checkpointId should be aborted.
_ChannelStateWriter.abort(checkpointId, exception, true) should also be
executed here._
!image-2022-07-09-22-21-24-417.png|width=803,height=307!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)