[jira] [Commented] (FLINK-16279) Per job Yarn application leak in normal execution mode.

Aljoscha Krettek (Jira) Wed, 04 Mar 2020 06:21:17 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051260#comment-17051260
 ]


Aljoscha Krettek commented on FLINK-16279:
------------------------------------------

This has always been a problem with {{ExecutionMode.NORMAL}} (which is used 
when the client is run "attached"). When we did the executor/JobClient work we 
didn't want to tackle that because it's a bit complicated but we should now do 
it. I think the only good option is to not try and be clever, this usually 
falls on our feet in the long run. What I'm proposing is:

- remove the distinction between {{ExecutionMode.NORMAL}} and 
{{ExecutionMode.DETACHED}}, a per-job cluster always shuts down when the job 
reaches a terminal state
- if a client needs to get a job execution result they need to get it through 
the cluster manager, i.e. YARN or Kubernetes, because this is the entity that 
really survives the lifetime of the job and thus has that information

> Per job Yarn application leak in normal execution mode.
> -------------------------------------------------------
>
>                 Key: FLINK-16279
>                 URL: https://issues.apache.org/jira/browse/FLINK-16279
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission, Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Wenlong Lyu
>            Priority: Major
>
> I run a job in yarn per job mode using {{env.executeAsync}}, the job failed 
> but the yarn cluster didn't be destroyed.
> After some research on the code, I found that:
> when running in attached mode, MiniDispatcher will never set 
> {{shutDownfuture}} before received a request from job client. 
> {code}
>               if (executionMode == ClusterEntrypoint.ExecutionMode.NORMAL) {
>                       // terminate the MiniDispatcher once we served the 
> first JobResult successfully
>                       jobResultFuture.thenAccept((JobResult result) -> {
>                               ApplicationStatus status = 
> result.getSerializedThrowable().isPresent() ?
>                                               ApplicationStatus.FAILED : 
> ApplicationStatus.SUCCEEDED;
>                               LOG.debug("Shutting down per-job cluster 
> because someone retrieved the job result.");
>                               shutDownFuture.complete(status);
>                       });
>               } 
> {code}
> However, when running in async mode(submit job by env.executeAsync), there 
> may be no request from job client because when a user find that the job is 
> failed from job client, he may never request the result again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-16279) Per job Yarn application leak in normal execution mode.

Reply via email to