[ https://issues.apache.org/jira/browse/FLINK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941521#comment-15941521 ]
ASF GitHub Bot commented on FLINK-6174: --------------------------------------- Github user WangTaoTheTonic commented on the issue: https://github.com/apache/flink/pull/3599 I don't think it's a good idea, as it can not solve the "split brain" issue too. The key problem is that `LeaderLatch` in curator is too sensitive to connection state to Zookeeper(it will revoke leadership when connection to zookeeper is temporarily broke), and probably the best way is offerring a "duller" LeaderLatch, which can be also used in standalone cluster. I did same work in our own private Spark release, let me see if it can be reused. > Introduce a leader election service in yarn mode to make JobManager always > available > ------------------------------------------------------------------------------------ > > Key: FLINK-6174 > URL: https://issues.apache.org/jira/browse/FLINK-6174 > Project: Flink > Issue Type: Improvement > Components: JobManager > Reporter: Tao Wang > Assignee: Tao Wang > > Now in yarn mode, if we use zookeeper as high availability choice, it will > create a election service to get a leader depending on zookeeper election. > When zookeeper leader crashes or the connection between JobManager and > zookeeper instance was broken, JobManager's leadership will be revoked and > send a Disconnect message to TaskManager, which will cancle all running tasks > and make them waiting connection rebuild between JM and ZK. > In yarn mode, we have one and only JobManager(AM) in same time, and it should > be alwasy leader instead of elected through zookeeper. We can introduce a new > leader election service in yarn mode to achive that. -- This message was sent by Atlassian JIRA (v6.3.15#6346)