[
https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiao Chen updated HDFS-9868:
----------------------------
Attachment: HDFS-9868.06.patch
I have given this more thoughts, and attaching patch 6.
I agree with the current implementation of passing in source cluster config,
which means the distcp should be executed from the target cluster. This is
because I can't find a way to generalize the 'remote' concept for both source
and destination. Updated the documents accordingly though for clarity. Tested
this can work when distcping between to HA clusters, as the new document
example shows.
Reviews appreciated.
> Add ability to read remote cluster configuration for DistCp
> -----------------------------------------------------------
>
> Key: HDFS-9868
> URL: https://issues.apache.org/jira/browse/HDFS-9868
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: distcp
> Affects Versions: 2.7.1
> Reporter: NING DING
> Assignee: NING DING
> Attachments: HDFS-9868.05.patch, HDFS-9868.06.patch,
> HDFS-9868.1.patch, HDFS-9868.2.patch, HDFS-9868.3.patch, HDFS-9868.4.patch
>
>
> Normally the HDFS cluster is HA enabled. It could take a long time when
> coping huge data by distp. If the source cluster changes active namenode, the
> distp will run failed. This patch supports the DistCp can read source cluster
> files in HA access mode. A source cluster configuration file needs to be
> specified (via the -sourceClusterConf option).
> The following is an example of the contents of a source cluster
> configuration
> file:
> {code:xml}
> <configuration>
> <property>
> <name>fs.defaultFS</name>
> <value>hdfs://mycluster</value>
> </property>
> <property>
> <name>dfs.nameservices</name>
> <value>mycluster</value>
> </property>
> <property>
> <name>dfs.ha.namenodes.mycluster</name>
> <value>nn1,nn2</value>
> </property>
> <property>
> <name>dfs.namenode.rpc-address.mycluster.nn1</name>
> <value>host1:9000</value>
> </property>
> <property>
> <name>dfs.namenode.rpc-address.mycluster.nn2</name>
> <value>host2:9000</value>
> </property>
> <property>
> <name>dfs.namenode.http-address.mycluster.nn1</name>
> <value>host1:50070</value>
> </property>
> <property>
> <name>dfs.namenode.http-address.mycluster.nn2</name>
> <value>host2:50070</value>
> </property>
> <property>
> <name>dfs.client.failover.proxy.provider.mycluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> </configuration>
> {code}
> The invocation of DistCp is as below:
> {code}
> bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar
> hdfs://nn2:8020/bar/foo
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]