[jira] [Commented] (FLINK-23189) Count and fail the task when the disk is error on JobManager

zlzhang0122 (Jira) Tue, 21 Sep 2021 23:06:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418420#comment-17418420
 ]


zlzhang0122 commented on FLINK-23189:
-------------------------------------

[~pnowojski] sure, I will check about it.

> Count and fail the task when the disk is error on JobManager
> ------------------------------------------------------------
>
>                 Key: FLINK-23189
>                 URL: https://issues.apache.org/jira/browse/FLINK-23189
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.12.2, 1.13.1
>            Reporter: zlzhang0122
>            Assignee: zlzhang0122
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>         Attachments: exception.txt
>
>
> When the jobmanager disk is error and the triggerCheckpoint will throw a 
> IOException and fail, this will cause a TRIGGER_CHECKPOINT_FAILURE, but this 
> failure won't cause Job failed. Users can hardly find this error if he don't 
> see the JobManager logs. To avoid this case, I propose that we can figure out 
> these IOException case and increase the failureCounter which can fail the job 
> finally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-23189) Count and fail the task when the disk is error on JobManager

Reply via email to