Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/612#discussion_r28814574
--- Diff:
flink-tests/src/test/java/org/apache/flink/test/recovery/AbstractProcessFailureRecoveryTest.java
---
@@ -112,9 +112,9 @@ public void testTaskManagerProcessFailure() {
Tuple2<String, Object> localAddress = new
Tuple2<String, Object>("localhost", jobManagerPort);
Configuration jmConfig = new Configuration();
-
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, "1 s");
-
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, "4 s");
-
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 2);
+
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, "1 ms");
+
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, "20 s");
+
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 20);
--- End diff --
How long do the tests take now? With a pause of 20s and a threshold of 20,
how long does the JobManager take to realize that the TaskManager is down?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---