[ https://issues.apache.org/jira/browse/FLINK-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119492#comment-16119492 ]
Till Rohrmann commented on FLINK-7352: -------------------------------------- I think [~StephanEwen] is right and the problem is https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphTestUtils.java#L203. You can simulate it by removing the sleep and introducing a small sleep in https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L401. I think the solution would be to wait on the {{SimpleAckingTaskManagerGateway}} until it has received all task submissions before switching the {{Executions}} to running. > ExecutionGraphRestartTest timeouts > ---------------------------------- > > Key: FLINK-7352 > URL: https://issues.apache.org/jira/browse/FLINK-7352 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Tests > Affects Versions: 1.4.0, 1.3.2 > Reporter: Nico Kruber > Priority: Critical > Labels: test-stability > > Recently, I received timeouts from some tests in > {{ExecutionGraphRestartTest}} like this > {code} > Tests in error: > ExecutionGraphRestartTest.testConcurrentLocalFailAndRestart:638 ยป Timeout > {code} > This particular instance is from 1.3.2 RC2 and stuck in > {{ExecutionGraphTestUtils#waitUntilDeployedAndSwitchToRunning()}} but I also > had instances stuck in {{ExecutionGraphTestUtils#waitUntilJobStatus}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)