Hi, We are running into intermittent checkpoint failures while checkpointing to S3.
As described in this thread - http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-5-some-thing-weird-td21309.html <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-5-some-thing-weird-td21309.html>, we see that the job restarts when it encounters such a failure. As mentioned in the thread, I see that there is an option to not fail tasks on checkpoint errors - *CheckpointConfig#setFailOnCheckpointingErrors(false)**. *However, this would mean that the job would continue running even in the case of persistent checkpoint failures. Is my understanding here correct? If above is true, then is there a way to configure an allowable number of checkpoint failures? i.e. something along the lines of "Don't fail the job if there are <=X number of checkpoint failures", so that *only *transient failures can be ignored. Thanks, Lakshmi