Piotr Nowojski created FLINK-23741:
--------------------------------------

             Summary: Waiting for final checkpoint can deadlock job
                 Key: FLINK-23741
                 URL: https://issues.apache.org/jira/browse/FLINK-23741
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing, Runtime / Task
    Affects Versions: 1.14.0
            Reporter: Piotr Nowojski


With {{ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH}} enabled, final checkpoint can 
deadlock (or timeout after very long time) if there is a race condition between 
selecting tasks to trigger checkpoint on and finishing tasks. FLINK-21246 was 
supposed to handle it, but it doesn't work as expected, because futures from:
org.apache.flink.runtime.taskexecutor.TaskExecutor#triggerCheckpoint
and
org.apache.flink.streaming.runtime.tasks.StreamTask#triggerCheckpointAsync
are not linked together. TaskExecutor#triggerCheckpoint reports that checkpoint 
has been successfully triggered, while {{StreamTask}} might have actually 
finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to