[jira] [Commented] (FLINK-6174) Introduce a leader election service in yarn mode to make JobManager always available

ASF GitHub Bot (JIRA) Thu, 23 Mar 2017 02:43:04 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938000#comment-15938000
 ]


ASF GitHub Bot commented on FLINK-6174:
---------------------------------------

Github user wenlong88 commented on the issue:

    https://github.com/apache/flink/pull/3599
  
    Hi, I may have described my concern wrongly in the last comment, my concern 
is that in yarn it is possible that two application master running at the same 
time: 
    eg: RM launches a AM and then the machine lost connection with RM by some 
reason, RM will launch another AM. It is possible that the first AM will be 
still running when launching the second AM in some scenario like NM heartbeat 
timeout but running.
    When it is possible that there are two AM running at the same time, we may 
go into a dead lock using the AlwaysLeaderService as follows: 
    1. the first AM grant leadership
    2. the second AM grant leadership
    3. the second AM write leader info
    4. the first AM write leader info
    5. the first AM killed by NM or some cluster monitor tool since RM marked 
NM as unavailable.



> Introduce a leader election service in yarn mode to make JobManager always 
> available
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-6174
>                 URL: https://issues.apache.org/jira/browse/FLINK-6174
>             Project: Flink
>          Issue Type: Improvement
>          Components: JobManager
>            Reporter: Tao Wang
>            Assignee: Tao Wang
>
> Now in yarn mode, if we use zookeeper as high availability choice, it will 
> create a election service to get a leader depending on zookeeper election.
> When zookeeper leader crashes or the connection between JobManager and 
> zookeeper instance was broken, JobManager's leadership will be revoked and 
> send a Disconnect message to TaskManager, which will cancle all running tasks 
> and make them waiting connection rebuild between JM and ZK.
> In yarn mode, we have one and only JobManager(AM) in same time, and it should 
> be alwasy leader instead of elected through zookeeper. We can introduce a new 
> leader election service in yarn mode to achive that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6174) Introduce a leader election service in yarn mode to make JobManager always available

Reply via email to