[
https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830330#comment-16830330
]
Zhenqiu Huang commented on FLINK-12343:
---------------------------------------
[~till.rohrmann][~rmetzger]
I think we can set the hdfs.replication in YarnConfiguration of
AbstractYarnClusterDescriptor. As, this configuration is only used in client
side, so will not impact the runtime file replications. The reason I initially
choose to use the setReplication method is that our org will use S3 for long
term to submit job to different cluster management system, I want to apply the
replication to both hdfs/s3. But It looks S3AFileSystem doesn't implement the
method. I think it is good to use hdfs.replication initially. How do you think?
> Allow set file.replication in Yarn Configuration
> ------------------------------------------------
>
> Key: FLINK-12343
> URL: https://issues.apache.org/jira/browse/FLINK-12343
> Project: Flink
> Issue Type: Improvement
> Components: Command Line Client, Deployment / YARN
> Affects Versions: 1.6.4, 1.7.2, 1.8.0
> Reporter: Zhenqiu Huang
> Assignee: Zhenqiu Huang
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently, FlinkYarnSessionCli upload jars into hdfs with default 3
> replications. From our production experience, we find that 3 replications
> will block big job (256 containers) to launch, when the HDFS is slow due to
> big workload for batch pipelines. Thus, we want to make the factor
> customizable from FlinkYarnSessionCli by adding an option.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)