[
https://issues.apache.org/jira/browse/FLINK-20681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254086#comment-17254086
]
Ruguo Yu commented on FLINK-20681:
----------------------------------
Hi, [~trohrmann], in my opinion, the yarn.provided.lib.dirs mainly focuses on
flink jars(e.g. flink-dist, lib/, plugins/) instead of user jars or other
third-party jars, and besides it works only for directories. The description of
this configuration option as follows:
!image-2020-12-23-20-58-41-234.png|width=849,height=86!
Unlike it, the yarn.ship-archives and yarn.ship-files can better cover the
resources required by users and support single or multiple files, which can
make it easier for users to add additional resources. However it is unfortunate
that two options above only support local resource.
Our current use request is that archive files or third-party jars or job udf
jars is placed in our remote file system(such as hdfs, because we don’t want to
maintain or store too many file resources on the client), and we have enhanced
the two options to support remote resource, for example : (1)
yarn.ship-files=hdfs://namenode/aaa/xxx.jar,hdfs://namenode/dir1/xxx.jar,hdfs://namenode/dir1/yyy.jar,hdfs://namenode/dir1/zzz.jar
(2)yarn.ship-files=hdfs://namenode/dir1,hdfs://namenode/dir2
> Support specifying the hdfs path when ship archives or files
> -------------------------------------------------------------
>
> Key: FLINK-20681
> URL: https://issues.apache.org/jira/browse/FLINK-20681
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Affects Versions: 1.12.0
> Reporter: Ruguo Yu
> Priority: Major
> Labels: pull-requests-available
> Fix For: 1.13.0
>
> Attachments: image-2020-12-23-20-58-41-234.png
>
>
> Currently, our team try to submit flink job that depends extra resource with
> yarn-application target, and use two options: "yarn.ship-archives" and
> "yarn.ship-files".
> But above options only support specifying local resource and shiping them to
> hdfs, besides if it can support remote resource on distributed filesystem
> (such as hdfs), then get the following benefits:
> * client will exclude the local resource uploading to accelerate the job
> submission process
> * yarn will cache them on the nodes so that they doesn't need to be
> downloaded for application
--
This message was sent by Atlassian Jira
(v8.3.4#803005)