[
https://issues.apache.org/jira/browse/FLINK-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119492#comment-16119492
]
Till Rohrmann commented on FLINK-7352:
--------------------------------------
I think [~StephanEwen] is right and the problem is
https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphTestUtils.java#L203.
You can simulate it by removing the sleep and introducing a small sleep in
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L401.
I think the solution would be to wait on the {{SimpleAckingTaskManagerGateway}}
until it has received all task submissions before switching the {{Executions}}
to running.
> ExecutionGraphRestartTest timeouts
> ----------------------------------
>
> Key: FLINK-7352
> URL: https://issues.apache.org/jira/browse/FLINK-7352
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination, Tests
> Affects Versions: 1.4.0, 1.3.2
> Reporter: Nico Kruber
> Priority: Critical
> Labels: test-stability
>
> Recently, I received timeouts from some tests in
> {{ExecutionGraphRestartTest}} like this
> {code}
> Tests in error:
> ExecutionGraphRestartTest.testConcurrentLocalFailAndRestart:638 ยป Timeout
> {code}
> This particular instance is from 1.3.2 RC2 and stuck in
> {{ExecutionGraphTestUtils#waitUntilDeployedAndSwitchToRunning()}} but I also
> had instances stuck in {{ExecutionGraphTestUtils#waitUntilJobStatus}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)