[ 
https://issues.apache.org/jira/browse/FLINK-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-13938:
-------------------------------------
    Component/s: Client / Job Submission

> Use pre-uploaded libs to accelerate flink submission
> ----------------------------------------------------
>
>                 Key: FLINK-13938
>                 URL: https://issues.apache.org/jira/browse/FLINK-13938
>             Project: Flink
>          Issue Type: New Feature
>          Components: Client / Job Submission, Deployment / YARN
>            Reporter: Yang Wang
>            Assignee: Yang Wang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, every time we start a flink cluster, flink lib jars need to be 
> uploaded to hdfs and then register Yarn local resource so that it could be 
> downloaded to jobmanager and all taskmanager container. I think we could have 
> two optimizations.
>  # Use pre-uploaded flink binary to avoid uploading of flink system jars
>  # Use the yarn public cache to eliminate the unnecessary jars downloading 
> and make launching container faster. The public cache could be shared by 
> different applications.
>  
> By default, the LocalResourceVisibility is APPLICATION, so they will be 
> downloaded only once and shared for all taskmanager containers of a same 
> application in the same node. However, different applications will have to 
> download all jars every time, including the flink-dist.jar. We could use the 
> yarn public cache to eliminate the unnecessary jars downloading and make 
> launching container faster.
>  
>  
> Following the discussion in the user ML. 
> [https://lists.apache.org/[email protected]:lte=1M:Flink%20Conf%20%22yarn.flink-dist-jar%22%20Question]
>  Take both FLINK-13938 and FLINK-14964 into account, this feature will be 
> done in the following steps.
>  * Enrich "-yt/--yarnship" to support HDFS directory
>  * Add a new config option to control whether to disable the flink-dist 
> uploading
>  * Enrich "-yt/--yarnship" to specify local resource visibility. It is 
> "APPLICATION" by default. It could be also configured to "PUBLIC", which 
> means shared by all applications, or "PRIVATE" which means shared by a same 
> user. (*Will be done later according to the feedback*)
>   
>  How to use this feature?
>  1. First, upload the Flink binary and user jars to the HDFS directories
>  2. Use "-yt/–yarnship" to specify the pre-uploaded libs
>  3. Disable the automatic uploading of flink-dist via 
> {{yarn.submission.automatic-flink-dist-upload}}: false
>   
>  A final submission command could be issued like following.
> {code:java}
> ./bin/flink run -m yarn-cluster -d \
> -yt hdfs://myhdfs/flink/release/flink-1.11 \
> -yD yarn.submission.automatic-flink-dist-upload=false \
> examples/streaming/WindowJoin.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to