[
https://issues.apache.org/jira/browse/FLINK-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805079#comment-16805079
]
Till Rohrmann commented on FLINK-12048:
---------------------------------------
The problem is actually a callback to the {{#onAddedJobGraph}} of the second
{{Dispatcher}} which is so delayed that it is executed after the second
{{Dispatcher}} has been shut down. Here is a commit with which one can
reproduce the interleaving locally:
https://github.com/tillrohrmann/flink/commit/f361cdec484e707061e0cbbd727f417fbe60e8b7.
As part of FLINK-11843 I want to rework that a {{Dispatcher}} is only running
if it has the leadership and not if it is on stand by. This could fix the
problem. Moreover, we should make sure that no concurrent operations are
ongoing when we terminate the {{Dispatcher}}.
> ZooKeeperHADispatcherTest failed on Travis
> ------------------------------------------
>
> Key: FLINK-12048
> URL: https://issues.apache.org/jira/browse/FLINK-12048
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination, Tests
> Affects Versions: 1.9.0
> Reporter: Chesnay Schepler
> Priority: Critical
> Labels: test-stability
>
> https://travis-ci.org/apache/flink/builds/512077301
> {code}
> 01:14:56.351 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time
> elapsed: 9.671 s <<< FAILURE! - in
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest
> 01:14:56.364 [ERROR]
> testStandbyDispatcherJobExecution(org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest)
> Time elapsed: 1.209 s <<< ERROR!
> org.apache.flink.runtime.util.TestingFatalErrorHandler$TestingException:
> org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the
> added job d51eeb908f360e44c0a2004e00a6afd2
> at
> org.apache.flink.runtime.dispatcher.ZooKeeperHADispatcherTest.teardown(ZooKeeperHADispatcherTest.java:117)
> Caused by: org.apache.flink.runtime.dispatcher.DispatcherException: Could not
> start the added job d51eeb908f360e44c0a2004e00a6afd2
> Caused by: java.lang.IllegalStateException: Not running. Forgot to call
> start()?
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)