[ https://issues.apache.org/jira/browse/FLINK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940522#comment-15940522 ]
ASF GitHub Bot commented on FLINK-6174: --------------------------------------- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3599 I would suggest to fix this the following way: - There is an upcoming patch that makes the Flink codebase use the `HighAvailabilityServices` properly in all places. - We introduce a new HA mode called `yarnsimple` or so (next to `none` and `zookeeper`) and instantiate a new implementation of `HighAvailabilityServices` which is ZooKeeper independent. - The new implementation of the High Availability Services does not use ZooKeeper. It uses a leader service that always grants the JobManager leadership, but also implements a way for TaskManagers to find the JobManager (to be seen how, possibly a file in HDFS or so). It also implements a ZooKeeper independent CompletedCheckpointStore that finds checkpoints by maintaining a file with completed checkpoints. That is all not a "proper" HA setup - it only works as long as there is strictly only one master But it comes close and is ZooKeeper independent. Is that what you are looking for? > Introduce a leader election service in yarn mode to make JobManager always > available > ------------------------------------------------------------------------------------ > > Key: FLINK-6174 > URL: https://issues.apache.org/jira/browse/FLINK-6174 > Project: Flink > Issue Type: Improvement > Components: JobManager > Reporter: Tao Wang > Assignee: Tao Wang > > Now in yarn mode, if we use zookeeper as high availability choice, it will > create a election service to get a leader depending on zookeeper election. > When zookeeper leader crashes or the connection between JobManager and > zookeeper instance was broken, JobManager's leadership will be revoked and > send a Disconnect message to TaskManager, which will cancle all running tasks > and make them waiting connection rebuild between JM and ZK. > In yarn mode, we have one and only JobManager(AM) in same time, and it should > be alwasy leader instead of elected through zookeeper. We can introduce a new > leader election service in yarn mode to achive that. -- This message was sent by Atlassian JIRA (v6.3.15#6346)