[GitHub] [flink] pnowojski commented on a diff in pull request #20233: [FLINK-28474][checkpoint] Fix the bug ChannelStateWriteResult might not fail after checkpoint abort

GitBox Wed, 13 Jul 2022 08:20:47 -0700


pnowojski commented on code in PR #20233:
URL: https://github.com/apache/flink/pull/20233#discussion_r920210091



##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SubtaskCheckpointCoordinatorImpl.java:
##########
@@ -316,6 +316,10 @@ public void checkpointState(
             // broadcast cancel checkpoint marker to avoid downstream 
back-pressure due to
             // checkpoint barrier align.
             operatorChain.broadcastEvent(new 
CancelCheckpointMarker(metadata.getCheckpointId()));
+            channelStateWriter.abort(
+                    metadata.getCheckpointId(),
+                    new CancellationException("checkpoint aborted via 
notification"),
+                    true);

Review Comment:
   I think there is no harm in executing the abort call here, but it's still 
something like "best effort". Take a look at the 
`SubtaskCheckpointCoordinatorImpl#createAbortedCheckpointSetWithLimitSize` way 
`abortedCheckpointIds` set is constructed. If there are many checkpoints 
failures, the set will be pruned, and this code here won't be triggered: 
`channelStateWriter.abort` won't be called.
   
   What exact issue was this bug causing in FLINK-26803 and what were the 
symptoms? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] pnowojski commented on a diff in pull request #20233: [FLINK-28474][checkpoint] Fix the bug ChannelStateWriteResult might not fail after checkpoint abort

Reply via email to