1996fanrui commented on code in PR #20233:
URL: https://github.com/apache/flink/pull/20233#discussion_r920121363


##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SubtaskCheckpointCoordinatorImpl.java:
##########
@@ -316,6 +316,10 @@ public void checkpointState(
             // broadcast cancel checkpoint marker to avoid downstream 
back-pressure due to
             // checkpoint barrier align.
             operatorChain.broadcastEvent(new 
CancelCheckpointMarker(metadata.getCheckpointId()));
+            channelStateWriter.abort(
+                    metadata.getCheckpointId(),
+                    new CancellationException("checkpoint aborted via 
notification"),
+                    true);

Review Comment:
   Hi @pnowojski , thanks for your review. 
   
   I don't know why don't execute it here. 
   
   `checkAndClearAbortedStatus` is called twice:
   - The first is `if (lastCheckpointId >= metadata.getCheckpointId())`, and 
execute abort and checkAndClearAbortedStatus there.
   - The second is here, I guess we should abort here, because 
`checkAndClearAbortedStatus(metadata.getCheckpointId()) == true` means the 
checkpointId is aborted, we can call `channelStateWriter.abort` too.
   
   And I think the new unit test can reproduce this leak, that is, 
ChannelStateWriteResult might not fail after checkpoint abort.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to