zlzhang0122 created FLINK-23189:
-----------------------------------
Summary: Count and fail the task when the disk is error on
JobManager
Key: FLINK-23189
URL: https://issues.apache.org/jira/browse/FLINK-23189
Project: Flink
Issue Type: Improvement
Affects Versions: 1.13.1, 1.12.2
Reporter: zlzhang0122
When the jobmanager disk is error and the triggerCheckpoint will throw a
IOException and fail, this will cause a TRIGGER_CHECKPOINT_FAILURE, but this
failure won't cause Job failed. Users can hardly find this error if he don't
see the JobManager logs. To avoid this case, I propose that we can figure out
these IOException case and increase the failureCounter which can fail the job
finally.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)