yanghua commented on issue #8322: [FLINK-12364] Introduce a CheckpointFailureManager to centralized manage checkpoint failure URL: https://github.com/apache/flink/pull/8322#issuecomment-493826613 @StefanRRichter After thinking seriously, my view of point has changed. There are two options: * to keep all test cases and user jobs' default behavior, the `tolerableCheckpointFailureNumber`'s default value seems should be `Integer.MAX_VALUE`; * to keep the compatibility, the `tolerableCheckpointFailureNumber`'s default value seems should be 0; The key problem is the two options ' **range of action** is different. The `failOnCheckpointingErrors` option just cover the task's checkpointing error in the TaskManager end (even not the whole execution phase). While the `tolerableCheckpointFailureNumber` option needs to cover the whole trigger and execution phases. The thought to support option1: In many scenes, if we set the `tolerableCheckpointFailureNumber`'s default value to 0. The behavior of the users' job would be changed. It would cause more frequency to fail and restart. For example, the task is not ready to do checkpoint so it sends a decline message to trigger failure manager to fail and restart the job. So it changed the test cases and user jobs' default behavior. This is the reason why I change the default value to `Integer.MAX_VALUE`, although they are sporadic. The thought to support option2: If we set the `tolerableCheckpointFailureNumber`'s default value to `Integer.MAX_VALUE`, it may introduce a **compatibility issue** how to handle the `failOnCheckpointingErrors` config option in the future? The original thought is that the failOnCheckpointingErrors(two values : 0 and Integer.MAX_VALUE) is a subset of the tolerableCheckpointFailureNumber(0 to Integer.MAX_VALUE). We had wanted to deprecate the `failOnCheckpointingErrors` option in the third step. What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services