[jira] [Commented] (FLINK-30553) checkpoint always IN-PROGRESS because of hdfs

Xintong Song (Jira) Thu, 05 Jan 2023 00:24:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-30553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654855#comment-17654855
 ]


Xintong Song commented on FLINK-30553:
--------------------------------------

Based on the information provided, I'm not sure whether 
{{DFSOutputStream.waitForAckedSeqno()}} is the cause of the problem. As shown 
in your screenshot, it calls `wait()` with a timeout of 1000ms. This should not 
cause the thread to be blocked for days.

If this is indeed the problem, then you probably should open a ticket in the 
hadoop project, rather than flink.

> checkpoint always IN-PROGRESS because of hdfs
> ---------------------------------------------
>
>                 Key: FLINK-30553
>                 URL: https://issues.apache.org/jira/browse/FLINK-30553
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.4
>         Environment: !微信图片_20230104140754.jpg!
>            Reporter: linqichen
>            Priority: Critical
>         Attachments: 微信图片_20230104140754.jpg, 微信图片_20230104140840.jpg, 
> 微信图片_20230104140848.jpg, 微信图片_20230104140857.jpg, 微信图片_20230104140903.jpg
>
>
> hey, I find a big problem. My flink didnot do checkpoint since 2022-12-24 
> (now 2023-1-4) which should do every 5 min. The last checkpoint's status is 
> "IN-PROGRESS",but all taskmanager have done their own work. I make jstack on 
> jobmanager and found that thread's status is "TIMED_WAITING" where executing 
> "DFSOutputStream.waitForAckedSeqno()".
> because my company not allow to copy things to public envirment, so i take 
> some photos.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30553) checkpoint always IN-PROGRESS because of hdfs

Reply via email to