[
https://issues.apache.org/jira/browse/FLINK-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yang Wang updated FLINK-13938:
------------------------------
Summary: Use pre-uploaded libs to accelerate flink submission (was: Use
pre-uploaded flink binary to accelerate flink submission)
> Use pre-uploaded libs to accelerate flink submission
> ----------------------------------------------------
>
> Key: FLINK-13938
> URL: https://issues.apache.org/jira/browse/FLINK-13938
> Project: Flink
> Issue Type: New Feature
> Components: Deployment / YARN
> Reporter: Yang Wang
> Assignee: Yang Wang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently, every time we start a flink cluster, flink lib jars need to be
> uploaded to hdfs and then register Yarn local resource so that it could be
> downloaded to jobmanager and all taskmanager container. I think we could have
> two optimizations.
> # Use pre-uploaded flink binary to avoid uploading of flink system jars
> # Use the yarn public cache to eliminate the unnecessary jars downloading
> and make launching container faster. The public cache could be shared by
> different applications.
>
> By default, the LocalResourceVisibility is APPLICATION, so they will be
> downloaded only once and shared for all taskmanager containers of a same
> application in the same node. However, different applications will have to
> download all jars every time, including the flink-dist.jar. We could use the
> yarn public cache to eliminate the unnecessary jars downloading and make
> launching container faster.
>
>
> Following the discussion in the user ML.
> [https://lists.apache.org/[email protected]:lte=1M:Flink%20Conf%20%22yarn.flink-dist-jar%22%20Question]
> Take both FLINK-13938 and FLINK-14964 into account, this feature will be
> done in the following steps.
> * Enrich "-yt/--yarnship" to support HDFS directory
> * Enrich "-yt/--yarnship" to specify local resource visibility. It is
> "APPLICATION" by default. It could be also configured to "PUBLIC", which
> means shared by all applications, or "PRIVATE" which means shared by a same
> user.
> * Add a new config option to control whether to optimize the
> submission(default is false). When configured to true, Flink client will try
> to filter the jars and files by name and size to avoid unnecessary uploading.
>
> How to use this feature?
> 1. First, upload the Flink binary and user jars to the HDFS directories
> 2. Use "-yt/–yarnship" to specify the pre-uploaded libs
> 3. Enable the support optimization
>
> A final submission command could be issued like following.
> {code:java}
> ./bin/flink run -m yarn-cluster -d \
> -yt
> hdfs://myhdfs/flink/release/flink-1.11:PUBLIC,hdfs://myhdfs/user/someone/mylib
> \
> -yD yarn.submission-optimization.enable=true \
> examples/streaming/WindowJoin.jar
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)