[ 
https://issues.apache.org/jira/browse/FLINK-38325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018271#comment-18018271
 ] 

Zakelly Lan commented on FLINK-38325:
-------------------------------------

hey [~lucasgameiroborges] thanks for reporting this. From the screenshots you 
provided, I'm not sure there is a deadlock or not.

Would you please provided the TM's thread dump when taking the checkpoint? You 
may split file into small ones using tools like {{zip}} . Or send a mail to 
u...@flink.apache.org and attach the thread dump. Thanks


> Checkpoints are hanging and timing out frequently
> -------------------------------------------------
>
>                 Key: FLINK-38325
>                 URL: https://issues.apache.org/jira/browse/FLINK-38325
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 2.0.0, 2.1.0
>         Environment: Flink version 2.1 (also observed on 2.0) with Forst 
> state backend.
> Running on kubernetes using the Flink apache kubernetes operator.
>            Reporter: Lucas Borges
>            Priority: Major
>         Attachments: Screenshot 2025-09-03 at 14.53.56.png, Screenshot 
> 2025-09-03 at 14.54.21.png, Screenshot 2025-09-03 at 14.54.36.png
>
>
> This issue is being observed on a Flink 2.1 job running with Forst state 
> backend. We noticed that checkpoints are failing due to timeouts/hanging more 
> frequently than other Flink 1.x jobs. 
> We suspect maybe there is a deadlock somewhere, based on one task-manager's 
> thread dump (could not attach it to the Jira issue due to size limits).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to