[
https://issues.apache.org/jira/browse/FLINK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937848#comment-15937848
]
ASF GitHub Bot commented on FLINK-6174:
---------------------------------------
Github user WangTaoTheTonic commented on the issue:
https://github.com/apache/flink/pull/3599
Thanks for your comments @wenlong88 .
I also gave a thought about adding retry logic when zk failover, but this
part should modify `LeaderLatch` in curator, which is a 3rd party library, or
we can only add a our private LeaderLatch through coping most parts of the
implementation in curator.
Even with adding this AlwaysLeaderService, the JM failover can also go well
as RM will start a new instance.
about FLIP-6, I'll check the solution and find if anything can help with
this :)
> Introduce a leader election service in yarn mode to make JobManager always
> available
> ------------------------------------------------------------------------------------
>
> Key: FLINK-6174
> URL: https://issues.apache.org/jira/browse/FLINK-6174
> Project: Flink
> Issue Type: Improvement
> Components: JobManager
> Reporter: Tao Wang
> Assignee: Tao Wang
>
> Now in yarn mode, if we use zookeeper as high availability choice, it will
> create a election service to get a leader depending on zookeeper election.
> When zookeeper leader crashes or the connection between JobManager and
> zookeeper instance was broken, JobManager's leadership will be revoked and
> send a Disconnect message to TaskManager, which will cancle all running tasks
> and make them waiting connection rebuild between JM and ZK.
> In yarn mode, we have one and only JobManager(AM) in same time, and it should
> be alwasy leader instead of elected through zookeeper. We can introduce a new
> leader election service in yarn mode to achive that.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)