ifndef-SleePy commented on a change in pull request #11347: [FLINK-14971][checkpointing] Make all the non-IO operations in CheckpointCoordinator single-threaded URL: https://github.com/apache/flink/pull/11347#discussion_r395429228
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PendingCheckpoint.java ########## @@ -111,6 +108,14 @@ /** The executor for potentially blocking I/O operations, like state disposal. */ private final Executor executor; + /** The executor for non-blocking operations. */ + private final Executor mainThreadExecutor; + + private final CompletedCheckpointStore completedCheckpointStore; + + /** The lock for avoiding conflict between I/O operations. */ + private final Object operationLock = new Object(); Review comment: Yes, there is a small possibility that the `CheckpointCoordinator` is shut down when a `PendingCheckpoint` is doing finalization. There could be some concurrent conflicts on `operatorStates` and `targetLocation`. It might be not a big deal because it would be shut down anyway. The finalization probably could not finish because the IO executor would be also shut down. However it's not so elegant to leave the concurrent issue to the `CheckpointStorageLocation` and `OperatorState`. And it's a bit heavy to make all of these implementations thread-safe to avoid the small possibility issue. So here I think introducing a lock outside is better. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services