[
https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiao Chen updated HDFS-9868:
----------------------------
Attachment: HDFS-9868.05.patch
I'm attaching a patch 5 to help move this forward, [~iceberg565] hope you don't
mind. Thanks again for the work so far. Feel free to let me know if you want to
continue the work on this.
Here's what's in patch 5:
- rebased to latest trunk, mainly due to HDFS-9640 as [~jojochuang] pointed out.
- addressed comments above
- Various nitty modifications based from my review.
A more general comment I'm still trying to address is, 'source' here seems
vague. It really depends on where the {{distcp}} command is run. In the doc
example, it actually looks more like a 'destination' config. So I'm thinking to
generalize it as 'remote' configuration. Additionally, it seems we should
provide a directory so both {{hdfs-site.xml}} and {{core-site.xml}} can be
read. Maybe there're also some MR/Yarn level changes, I'll test and see.
> add reading source cluster with HA access mode feature for DistCp
> -----------------------------------------------------------------
>
> Key: HDFS-9868
> URL: https://issues.apache.org/jira/browse/HDFS-9868
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: distcp
> Affects Versions: 2.7.1
> Reporter: NING DING
> Assignee: NING DING
> Attachments: HDFS-9868.05.patch, HDFS-9868.1.patch,
> HDFS-9868.2.patch, HDFS-9868.3.patch, HDFS-9868.4.patch
>
>
> Normally the HDFS cluster is HA enabled. It could take a long time when
> coping huge data by distp. If the source cluster changes active namenode, the
> distp will run failed. This patch supports the DistCp can read source cluster
> files in HA access mode. A source cluster configuration file needs to be
> specified (via the -sourceClusterConf option).
> The following is an example of the contents of a source cluster
> configuration
> file:
> {code:xml}
> <configuration>
> <property>
> <name>fs.defaultFS</name>
> <value>hdfs://mycluster</value>
> </property>
> <property>
> <name>dfs.nameservices</name>
> <value>mycluster</value>
> </property>
> <property>
> <name>dfs.ha.namenodes.mycluster</name>
> <value>nn1,nn2</value>
> </property>
> <property>
> <name>dfs.namenode.rpc-address.mycluster.nn1</name>
> <value>host1:9000</value>
> </property>
> <property>
> <name>dfs.namenode.rpc-address.mycluster.nn2</name>
> <value>host2:9000</value>
> </property>
> <property>
> <name>dfs.namenode.http-address.mycluster.nn1</name>
> <value>host1:50070</value>
> </property>
> <property>
> <name>dfs.namenode.http-address.mycluster.nn2</name>
> <value>host2:50070</value>
> </property>
> <property>
> <name>dfs.client.failover.proxy.provider.mycluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> </configuration>
> {code}
> The invocation of DistCp is as below:
> {code}
> bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar
> hdfs://nn2:8020/bar/foo
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]