[
https://issues.apache.org/jira/browse/FLINK-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051260#comment-17051260
]
Aljoscha Krettek commented on FLINK-16279:
------------------------------------------
This has always been a problem with {{ExecutionMode.NORMAL}} (which is used
when the client is run "attached"). When we did the executor/JobClient work we
didn't want to tackle that because it's a bit complicated but we should now do
it. I think the only good option is to not try and be clever, this usually
falls on our feet in the long run. What I'm proposing is:
- remove the distinction between {{ExecutionMode.NORMAL}} and
{{ExecutionMode.DETACHED}}, a per-job cluster always shuts down when the job
reaches a terminal state
- if a client needs to get a job execution result they need to get it through
the cluster manager, i.e. YARN or Kubernetes, because this is the entity that
really survives the lifetime of the job and thus has that information
> Per job Yarn application leak in normal execution mode.
> -------------------------------------------------------
>
> Key: FLINK-16279
> URL: https://issues.apache.org/jira/browse/FLINK-16279
> Project: Flink
> Issue Type: Bug
> Components: Client / Job Submission, Runtime / Coordination
> Affects Versions: 1.10.0
> Reporter: Wenlong Lyu
> Priority: Major
>
> I run a job in yarn per job mode using {{env.executeAsync}}, the job failed
> but the yarn cluster didn't be destroyed.
> After some research on the code, I found that:
> when running in attached mode, MiniDispatcher will never set
> {{shutDownfuture}} before received a request from job client.
> {code}
> if (executionMode == ClusterEntrypoint.ExecutionMode.NORMAL) {
> // terminate the MiniDispatcher once we served the
> first JobResult successfully
> jobResultFuture.thenAccept((JobResult result) -> {
> ApplicationStatus status =
> result.getSerializedThrowable().isPresent() ?
> ApplicationStatus.FAILED :
> ApplicationStatus.SUCCEEDED;
> LOG.debug("Shutting down per-job cluster
> because someone retrieved the job result.");
> shutDownFuture.complete(status);
> });
> }
> {code}
> However, when running in async mode(submit job by env.executeAsync), there
> may be no request from job client because when a user find that the job is
> failed from job client, he may never request the result again.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)