StefanRRichter commented on a change in pull request #8322: [FLINK-12364]
Introduce a CheckpointFailureManager to centralized manage checkpoint failure
URL: https://github.com/apache/flink/pull/8322#discussion_r285086847
##########
File path:
flink-end-to-end-tests/flink-streaming-kafka-test-base/src/main/java/org/apache/flink/streaming/kafka/test/base/KafkaExampleUtil.java
##########
@@ -45,6 +45,7 @@ public static StreamExecutionEnvironment
prepareExecutionEnv(ParameterTool param
env.getConfig().disableSysoutLogging();
env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4,
10000));
env.enableCheckpointing(5000); // create a checkpoint every 5
seconds
+
env.getCheckpointConfig().setTolerableCheckpointFailureNumber(Integer.MAX_VALUE);
Review comment:
After a quick comparison back with master and 1.7 I think there is at least
a problem with the `DECLINED` case. It was treated similar to e.g. subsumed and
was never leading to a job failure. That also makes sense because this cause is
just existing because the JM can already start triggering checkpoints before
all tasks are running. This is something that is currently (unfortunately) to
expect and that should not lead to a failover because it can happen regularly
in the beginning of a job. Wdyt? If you agree let's also double-check the other
cases one more time.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services