[ 
https://issues.apache.org/jira/browse/FLINK-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-23189.
----------------------------------
    Fix Version/s: 1.14.0
     Release Note: 
IOExceptions thrown during triggering checkpoints on the Job Manager (like for 
example errors while creating directories) will be since now accounted against 
maximum number of tolerable checkpoint failures. 

The number of tolerable checkpoint failures can be adjusted or disabled via:
org.apache.flink.streaming.api.environment.CheckpointConfig#setTolerableCheckpointFailureNumber
(which is disabled by default).
       Resolution: Fixed

Merged to master as 8463d5e7f69..f8928792feb

> Count and fail the task when the disk is error on JobManager
> ------------------------------------------------------------
>
>                 Key: FLINK-23189
>                 URL: https://issues.apache.org/jira/browse/FLINK-23189
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.12.2, 1.13.1
>            Reporter: zlzhang0122
>            Assignee: zlzhang0122
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>         Attachments: exception.txt
>
>
> When the jobmanager disk is error and the triggerCheckpoint will throw a 
> IOException and fail, this will cause a TRIGGER_CHECKPOINT_FAILURE, but this 
> failure won't cause Job failed. Users can hardly find this error if he don't 
> see the JobManager logs. To avoid this case, I propose that we can figure out 
> these IOException case and increase the failureCounter which can fail the job 
> finally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to