[jira] [Updated] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down

Till Rohrmann (Jira) Tue, 24 Sep 2019 05:50:07 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Till Rohrmann updated FLINK-14010:
----------------------------------
    Fix Version/s: 1.8.3
                   1.10.0

> Dispatcher & JobManagers don't give up leadership when AM is shut down
> ----------------------------------------------------------------------
>
>                 Key: FLINK-14010
>                 URL: https://issues.apache.org/jira/browse/FLINK-14010
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Runtime / Coordination
>    Affects Versions: 1.7.2, 1.8.2, 1.9.0, 1.10.0
>            Reporter: tison
>            Assignee: tison
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.10.0, 1.9.1, 1.8.3
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In YARN deployment scenario, YARN RM possibly launches a new AM for the job 
> even if the previous AM does not terminated, for example, when AMRM heartbeat 
> timeout. This is a common case that RM will send a shutdown request to the 
> previous AM and expect the AM shutdown properly.
> However, currently in {{YARNResourceManager}}, we handle this request in 
> {{onShutdownRequest}} which simply close the {{YARNResourceManager}} *but not 
> Dispatcher and JobManagers*. Thus, Dispatcher and JobManager launched in new 
> AM cannot be granted leadership properly. Visually,
> on previous AM: Dispatcher leader, JM leaders
> on new AM: ResourceManager leader
> since on client side or in per-job mode, JobManager address and port are 
> configured as the new AM, the whole cluster goes into an unrecoverable 
> inconsistent status: client all queries the dispatcher on new AM who is now 
> the leader. Briefly, Dispatcher and JobManagers on previous AM do not give up 
> their leadership properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-14010) Dispatcher & JobManagers don't give up leadership when AM is shut down

Reply via email to