[
https://issues.apache.org/jira/browse/FLINK-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Nowojski closed FLINK-23189.
----------------------------------
Fix Version/s: 1.14.0
Release Note:
IOExceptions thrown during triggering checkpoints on the Job Manager (like for
example errors while creating directories) will be since now accounted against
maximum number of tolerable checkpoint failures.
The number of tolerable checkpoint failures can be adjusted or disabled via:
org.apache.flink.streaming.api.environment.CheckpointConfig#setTolerableCheckpointFailureNumber
(which is disabled by default).
Resolution: Fixed
Merged to master as 8463d5e7f69..f8928792feb
> Count and fail the task when the disk is error on JobManager
> ------------------------------------------------------------
>
> Key: FLINK-23189
> URL: https://issues.apache.org/jira/browse/FLINK-23189
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.12.2, 1.13.1
> Reporter: zlzhang0122
> Assignee: zlzhang0122
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.14.0
>
> Attachments: exception.txt
>
>
> When the jobmanager disk is error and the triggerCheckpoint will throw a
> IOException and fail, this will cause a TRIGGER_CHECKPOINT_FAILURE, but this
> failure won't cause Job failed. Users can hardly find this error if he don't
> see the JobManager logs. To avoid this case, I propose that we can figure out
> these IOException case and increase the failureCounter which can fail the job
> finally.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)