yanghua commented on issue #8322: [FLINK-12364] Introduce a 
CheckpointFailureManager to centralized manage checkpoint failure
URL: https://github.com/apache/flink/pull/8322#issuecomment-493826613
 
 
   @StefanRRichter After thinking seriously, my view of point has changed. 
   
   There are two options:
   
   * to keep all test cases and user jobs' default behavior, the 
`tolerableCheckpointFailureNumber`'s default value seems should be 
`Integer.MAX_VALUE`;
   * to keep the compatibility, the `tolerableCheckpointFailureNumber`'s 
default value seems should be 0;
   
   The key problem is the two options ' **range of action** is different. The 
`failOnCheckpointingErrors` option just cover the task's checkpointing error in 
the TaskManager end (even not the whole execution phase). While the 
`tolerableCheckpointFailureNumber` option needs to cover the whole trigger and 
execution phases.
   
   The thought to support option1:
   
   In many scenes, if we set the `tolerableCheckpointFailureNumber`'s default 
value to 0. The behavior of the users' job would be changed. It would cause 
more frequency to fail and restart. For example, the task is not ready to do 
checkpoint so it sends a decline message to trigger failure manager to fail and 
restart the job. So it changed the test cases and user jobs' default behavior. 
This is the reason why I change the default value to `Integer.MAX_VALUE`, 
although they are sporadic.
   
   The thought to support option2:
   
   If we set the `tolerableCheckpointFailureNumber`'s default value to 
`Integer.MAX_VALUE`, it may introduce a **compatibility issue** how to handle 
the `failOnCheckpointingErrors` config option in the future? The original 
thought is that the failOnCheckpointingErrors(two values : 0 and 
Integer.MAX_VALUE) is a subset of the tolerableCheckpointFailureNumber(0 to 
Integer.MAX_VALUE). We had wanted to deprecate the `failOnCheckpointingErrors` 
option in the third step.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to