Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/4364#discussion_r128512830
--- Diff:
flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphTestUtils.java
---
@@ -189,6 +189,35 @@ public static void finishAllVertices(ExecutionGraph
eg) {
}
}
+ /**
+ * Turns a newly scheduled execution graph into a state where all
vertices run.
+ * This waits until all executions have reached state 'DEPLOYING' and
then switches them to running.
+ */
+ public static void waitUntilDeployedAndSwitchToRunning(ExecutionGraph
eg, long timeout) throws TimeoutException {
+ // wait until everything is running
+ for (ExecutionVertex ev : eg.getAllExecutionVertices()) {
+ final Execution exec = ev.getCurrentExecutionAttempt();
+ waitUntilExecutionState(exec, ExecutionState.DEPLOYING,
timeout);
+ }
+
+ // Note: As ugly as it is, we need this minor sleep, because
between switching
+ // to 'DEPLOYED' and when the 'switchToRunning()' may be called
lies a race check
+ // against concurrent modifications (cancel / fail). We can
only switch this to running
+ // once that check is passed. For the actual runtime, this
switch is triggered by a callback
+ // from the TaskManager, which comes strictly after that. For
tests, we use mock TaskManagers
+ // which cannot easily tell us when that condition has
happened, unfortunately.
+ try {
+ Thread.sleep(2);
--- End diff --
In very rare cases, it might. I want to change the `Execution` a bit on the
`master` to make this unnecessary.
However, that is too much surgery in a critical part for a bugfix release,
so I decided to be conservative in the runtime code and rather pay this price
in the tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---