[ 
https://issues.apache.org/jira/browse/FLINK-31249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

renxiang zhou updated FLINK-31249:
----------------------------------
    Language:   (was: JAVA)

> Checkpoint Timer failed to process timeout events when it blocked at writing 
> _metadata to DFS
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-31249
>                 URL: https://issues.apache.org/jira/browse/FLINK-31249
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.6, 1.16.0
>            Reporter: renxiang zhou
>            Priority: Major
>             Fix For: 1.18.0
>
>         Attachments: image-2023-02-28-11-25-03-637.png
>
>
> The jobmanager-future thread may be blocked at writing metadata to DFS caused 
> by a DFS failure, and the CheckpointCoordinator Lock is hold by this thread. 
> When the next Checkpoint is triggered, the Checkpoint Timer thread waits for 
> the lock to be released.  If the previous checkpoint times out, the 
> checkpoint timer does not execute the timeout event since it is blocked at 
> waiting for the lock. As a result, the previous checkpoint cannot be 
> cancelled.
> !image-2023-02-28-11-25-03-637.png|width=1144,height=248!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to