[
https://issues.apache.org/jira/browse/FLINK-30553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654855#comment-17654855
]
Xintong Song commented on FLINK-30553:
--------------------------------------
Based on the information provided, I'm not sure whether
{{DFSOutputStream.waitForAckedSeqno()}} is the cause of the problem. As shown
in your screenshot, it calls `wait()` with a timeout of 1000ms. This should not
cause the thread to be blocked for days.
If this is indeed the problem, then you probably should open a ticket in the
hadoop project, rather than flink.
> checkpoint always IN-PROGRESS because of hdfs
> ---------------------------------------------
>
> Key: FLINK-30553
> URL: https://issues.apache.org/jira/browse/FLINK-30553
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.14.4
> Environment: !微信图片_20230104140754.jpg!
> Reporter: linqichen
> Priority: Critical
> Attachments: 微信图片_20230104140754.jpg, 微信图片_20230104140840.jpg,
> 微信图片_20230104140848.jpg, 微信图片_20230104140857.jpg, 微信图片_20230104140903.jpg
>
>
> hey, I find a big problem. My flink didnot do checkpoint since 2022-12-24
> (now 2023-1-4) which should do every 5 min. The last checkpoint's status is
> "IN-PROGRESS",but all taskmanager have done their own work. I make jstack on
> jobmanager and found that thread's status is "TIMED_WAITING" where executing
> "DFSOutputStream.waitForAckedSeqno()".
> because my company not allow to copy things to public envirment, so i take
> some photos.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)