Roman Khachatryan created FLINK-20672: -----------------------------------------
Summary: CheckpointAborted RPC failure can fail JM Key: FLINK-20672 URL: https://issues.apache.org/jira/browse/FLINK-20672 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.11.3, 1.12.0 Reporter: Roman Khachatryan Introduced in FLINK-8871, aborted RPC notifications are done asynchonously: {code} private void sendAbortedMessages(long checkpointId, long timeStamp) { // send notification of aborted checkpoints asynchronously. executor.execute(() -> { // send the "abort checkpoint" messages to necessary vertices. // .. }); } {code} However, the executor that eventually executes this request is created as follows {code} final ScheduledExecutorService futureExecutor = Executors.newScheduledThreadPool( Hardware.getNumberCPUCores(), new ExecutorThreadFactory("jobmanager-future")); {code} ExecutorThreadFactory uses UncaughtExceptionHandler that exits JVM on error. cc: [~yunta] -- This message was sent by Atlassian Jira (v8.3.4#803005)