[
https://issues.apache.org/jira/browse/FLINK-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636033#comment-16636033
]
Thomas Wozniakowski commented on FLINK-10475:
---------------------------------------------
Aha - so it appears to be the version of Zookeeper. Using *3.5.3-beta* causes
the silent no-failover, using *3.5.4-beta* works as intended.
Maybe worth adding a client side check to refuse to start if connecting to a
*3.5.3-beta* quorum?
> Standalone HA - Leader election is not triggered on loss of leader
> ------------------------------------------------------------------
>
> Key: FLINK-10475
> URL: https://issues.apache.org/jira/browse/FLINK-10475
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.6.1, 1.5.4
> Reporter: Thomas Wozniakowski
> Priority: Blocker
> Attachments: t1.log, t2.log, t3.log
>
>
> Hey Guys,
> Just testing the new bugfix release of 1.5.4 (edit: also happens with 1.6.1).
> Happy to see that the issue of jobgraphs hanging around forever has been
> resolved in standalone/zookeeper HA mode, but now I'm seeing a different
> issue.
> It looks like the HA failover is never triggered. I set up a 3/3/3 cluster of
> zookeeper/jobmanager/taskmanagers. Started my job, all fine with the new
> version. I then proceeded to kill the leading jobmanager to test the failover.
> The remaining jobmanagers never triggered a leader election, and simply got
> stuck.
> Please give me a shout if I can provide any more useful information
> EDIT
> Jobmanager logs attached below. You can see that I brought up a fresh
> cluster, one JM was elected leader (no taskmanagers or actual jobs in this
> case). I then let the cluster sit there for half an hour or so, before
> killing the leader. The log files are snapshotted maybe half an hour after
> that. You can see that a second election was never triggered.
> In case it's useful, our zookeeper quorum is running "3.5.3-beta". This setup
> previously worked with 1.4.3.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)