[
https://issues.apache.org/jira/browse/FLINK-20681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254145#comment-17254145
]
Ruguo Yu commented on FLINK-20681:
----------------------------------
It will prompt the error that the path format must be "hdfs://namenode/" if
specify non-hdfs remote location and it is expected. because filesystem is
created with yarn configuration in YarnClusterDescriptor and not support
non-hdfs schema.
Actually, our remote location is not hdfs but bos(a object storage system like
aliyun oss)and some resources such as third-pary jar / user job jar on it, but
it is not difficult to support this dfs, in short we pass fs.bos.impl by
"flink.hadoop.fs.bos.impl=xxxClass" in flink configuration to
"fs.bos.impl=xxxClass" in hadoop configuration and local has jar where xxxClass
is in. And then YarnApplicationFileUploader register remote resources use Path
getFileSystem and use FileUtil copyresources from bos to hdfs, this idea should
also be applicable to other dfs like s3. code like this:
!image-2020-12-24-01-01-10-021.png|width=650,height=206!
Finally, it is well for the following question.
{quote}Would then the only change necessary be changing
{{YarnClusterDescriptor.shipFiles}} from {{List<File>}} to {{List<Path>}}?
{quote}
> Support specifying the hdfs path when ship archives or files
> -------------------------------------------------------------
>
> Key: FLINK-20681
> URL: https://issues.apache.org/jira/browse/FLINK-20681
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Affects Versions: 1.12.0
> Reporter: Ruguo Yu
> Priority: Major
> Labels: pull-requests-available
> Fix For: 1.13.0
>
> Attachments: image-2020-12-23-20-58-41-234.png,
> image-2020-12-24-01-01-10-021.png
>
>
> Currently, our team try to submit flink job that depends extra resource with
> yarn-application target, and use two options: "yarn.ship-archives" and
> "yarn.ship-files".
> But above options only support specifying local resource and shiping them to
> hdfs, besides if it can support remote resource on distributed filesystem
> (such as hdfs), then get the following benefits:
> * client will exclude the local resource uploading to accelerate the job
> submission process
> * yarn will cache them on the nodes so that they doesn't need to be
> downloaded for application
--
This message was sent by Atlassian Jira
(v8.3.4#803005)