[
https://issues.apache.org/jira/browse/FLINK-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kostas Kloudas updated FLINK-13938:
-----------------------------------
Affects Version/s: 1.11.0
> Use pre-uploaded libs to accelerate flink submission
> ----------------------------------------------------
>
> Key: FLINK-13938
> URL: https://issues.apache.org/jira/browse/FLINK-13938
> Project: Flink
> Issue Type: New Feature
> Components: Client / Job Submission, Deployment / YARN
> Affects Versions: 1.11.0
> Reporter: Yang Wang
> Assignee: Yang Wang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.11.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently, every time we start a flink cluster, flink lib jars need to be
> uploaded to hdfs and then register Yarn local resource so that it could be
> downloaded to jobmanager and all taskmanager container. I think we could have
> two optimizations.
> # Use pre-uploaded flink binary to avoid uploading of flink system jars
> # By default, the LocalResourceVisibility is APPLICATION, so they will be
> downloaded only once and shared for all taskmanager containers of a same
> application in the same node. However, different applications will have to
> download all jars every time, including the flink-dist.jar. We could use the
> yarn public cache to eliminate the unnecessary jars downloading and make
> launching container faster.
>
> How the feature work?
> * Add {{yarn.provided.lib.dirs}} to configure pre-uploaded libs, which
> contain files that are useful for all the users of the platform(i.e.
> different applications).
> * When the Flink client wants to ship a local file, it will check the
> provided libs first. If the provided libs contains a file with the same name,
> the local ship files will be automatically excluded from uploading.
> * These provided libs needs to be public readable and will be set with
> {{PUBLIC}} visibility for local resources. So they will be cache in the nodes
> and shared by different applications.
>
> How to use the pre-upload feature?
> 1. First, upload the Flink binary to the HDFS directories
> 2. Use {{yarn.provided.lib.dirs}} to specify the pre-uploaded libs
>
> A final submission command could be issued like following.
> {code:java}
> ./bin/flink run -m yarn-cluster -d \
> -yD
> yarn.provided.lib.dirs=hdfs://myhdfs/flink/lib,hdfs://myhdfs/flink/plugins \
> examples/streaming/WindowJoin.jar
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)