[ 
https://issues.apache.org/jira/browse/FLINK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated FLINK-25410:
---------------------------------
    Description: 
h2. Why

In our internal streaming platform, we will use flink-cli tool to submit Flink 
streaming application on Yarn.

However when encountering Hadoop cluster down and then lots of flink apps need 
to be resubmitted, the submitter of worker in our platform will hang at this 
time.

Because the Yarn cluster resources are tight and the scheduling efficiency 
becomes low when lots of apps needs to be started.

And flink-cli will not exit until the app status changes to running.

In addition, I also think there is no need to wait when app status is accepted 
with detach mode on Yarn.
h2. How

When app in accpeted status, flink-cli should exit directly to release 
submitter worker process resource. The PR could refer to : 
https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224

  was:
h2. Why
In our internal streaming platform, we will use flink-cli tool to submit Flink 
streaming application on Yarn. 

However when encountering Hadoop cluster down and then lots of flink apps need 
to be resubmitted, the submitter of worker in our platform will hang.

Because the Yarn cluster resources are tight, flink-cli will exit until the 
app's status change to running


> Flink CLI should exit when app is accepted with detach mode on Yarn
> -------------------------------------------------------------------
>
>                 Key: FLINK-25410
>                 URL: https://issues.apache.org/jira/browse/FLINK-25410
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>            Reporter: Junfan Zhang
>            Priority: Major
>
> h2. Why
> In our internal streaming platform, we will use flink-cli tool to submit 
> Flink streaming application on Yarn.
> However when encountering Hadoop cluster down and then lots of flink apps 
> need to be resubmitted, the submitter of worker in our platform will hang at 
> this time.
> Because the Yarn cluster resources are tight and the scheduling efficiency 
> becomes low when lots of apps needs to be started.
> And flink-cli will not exit until the app status changes to running.
> In addition, I also think there is no need to wait when app status is 
> accepted with detach mode on Yarn.
> h2. How
> When app in accpeted status, flink-cli should exit directly to release 
> submitter worker process resource. The PR could refer to : 
> https://github.com/apache/flink/blob/f191becdb42d6df823a103dc4f787c4737baa8e7/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L1224



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to