StefanRRichter commented on a change in pull request #8322: [FLINK-12364] 
Introduce a CheckpointFailureManager to centralized manage checkpoint failure
URL: https://github.com/apache/flink/pull/8322#discussion_r285055737
 
 

 ##########
 File path: 
flink-tests/src/test/java/org/apache/flink/test/checkpointing/ZooKeeperHighAvailabilityITCase.java
 ##########
 @@ -187,6 +187,7 @@ public void testRestoreBehaviourWithFaultyStateHandles() 
throws Exception {
                env.setParallelism(1);
                
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(Integer.MAX_VALUE, 
0));
                env.enableCheckpointing(10); // Flink doesn't allow lower than 
10 ms
+               
env.getCheckpointConfig().setTolerableCheckpointFailureNumber(Integer.MAX_VALUE);
 
 Review comment:
   All I get from the log is that it looks more like a deadlock, this might be 
unrelated to your changes and more about some changes to the network stack. 
Then the "fix" was probably just only a more lucky run. But that would be 
actually good news. I am wondering if that also applies for the other cases 
where you had to set the parameter to pass (e.g. `StatefulStreamingJob`)? Then 
I would double check if about the deadlock and you could revert to test 
changes. If this is all true, it is not the PRs fault be I still like to be on 
the safe side before we merge.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to