Lucas Borges created FLINK-38325:
------------------------------------

             Summary: Checkpoints are hanging and timing out frequently
                 Key: FLINK-38325
                 URL: https://issues.apache.org/jira/browse/FLINK-38325
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 2.0.0, 2.1.0
         Environment: Flink version 2.1 (also observed on 2.0) with Forst state 
backend.
Running on kubernetes using the Flink apache kubernetes operator.
            Reporter: Lucas Borges
         Attachments: Screenshot 2025-09-03 at 14.53.56.png, Screenshot 
2025-09-03 at 14.54.21.png, Screenshot 2025-09-03 at 14.54.36.png

This issue is being observed on a Flink 2.1 job running with Forst state 
backend. We noticed that checkpoints are failing due to timeouts/hanging more 
frequently than other Flink 1.x jobs. 

We suspect maybe there is a deadlock somewhere, based on one task-manager's 
thread dump (could not attach it to the Jira issue due to size limits).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to