[ https://issues.apache.org/jira/browse/FLINK-38325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018271#comment-18018271 ]
Zakelly Lan commented on FLINK-38325: ------------------------------------- hey [~lucasgameiroborges] thanks for reporting this. From the screenshots you provided, I'm not sure there is a deadlock or not. Would you please provided the TM's thread dump when taking the checkpoint? You may split file into small ones using tools like {{zip}} . Or send a mail to u...@flink.apache.org and attach the thread dump. Thanks > Checkpoints are hanging and timing out frequently > ------------------------------------------------- > > Key: FLINK-38325 > URL: https://issues.apache.org/jira/browse/FLINK-38325 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 2.0.0, 2.1.0 > Environment: Flink version 2.1 (also observed on 2.0) with Forst > state backend. > Running on kubernetes using the Flink apache kubernetes operator. > Reporter: Lucas Borges > Priority: Major > Attachments: Screenshot 2025-09-03 at 14.53.56.png, Screenshot > 2025-09-03 at 14.54.21.png, Screenshot 2025-09-03 at 14.54.36.png > > > This issue is being observed on a Flink 2.1 job running with Forst state > backend. We noticed that checkpoints are failing due to timeouts/hanging more > frequently than other Flink 1.x jobs. > We suspect maybe there is a deadlock somewhere, based on one task-manager's > thread dump (could not attach it to the Jira issue due to size limits). -- This message was sent by Atlassian Jira (v8.20.10#820010)