Piotr Nowojski created FLINK-23741:
--------------------------------------
Summary: Waiting for final checkpoint can deadlock job
Key: FLINK-23741
URL: https://issues.apache.org/jira/browse/FLINK-23741
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing, Runtime / Task
Affects Versions: 1.14.0
Reporter: Piotr Nowojski
With {{ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH}} enabled, final checkpoint can
deadlock (or timeout after very long time) if there is a race condition between
selecting tasks to trigger checkpoint on and finishing tasks. FLINK-21246 was
supposed to handle it, but it doesn't work as expected, because futures from:
org.apache.flink.runtime.taskexecutor.TaskExecutor#triggerCheckpoint
and
org.apache.flink.streaming.runtime.tasks.StreamTask#triggerCheckpointAsync
are not linked together. TaskExecutor#triggerCheckpoint reports that checkpoint
has been successfully triggered, while {{StreamTask}} might have actually
finished.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)