[
https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172001#comment-15172001
]
NING DING commented on HDFS-9868:
---------------------------------
[~qwertymaniac], thanks for pointing out HDFS-6376. I didn't find that patch
before.
Yes, the patch HDFS-6376 and mine solved the same issue. But their solutions
are different.
I'm using Hadoop-2.7.1, but I cannot find the dfs.internal.nameservices
configuration property description in any hadoop doc.
I only can find it's defined in hdfs-default.xml. From its description in
hdfs-default.xml it's hard to associate it with DistCp reading HA cluster file
issue.
I suggest it should be mentioned in DistCp guide and show an example. But if I
missed some doc mentioned it, please tell me. Thanks.
The DistCp is a tool and in fact it's just a mapreduce program.
To use HDFS-6376 patch, the local cluster must update the
dfs.internal.nameservices and dfs.nameservices configuration values.
In DistCp job, the map task is hadoop client to read the source HA cluster
files. So all the nodemanager running environment must set
dfs.internal.nameservices and dfs.nameservices values correctly.
I think it's not very appropriate to update the cluster configuration for
running a MR job.
I'm just a hadoop cluster end user, not the cluster administrator.
For running DistCp, I have to ask the administrator to change hadoop
configuration. It may not very appropriate.
So I think my HDFS-9868 patch could be an alternative option to solve DistCp
reading HA cluster file issue.
My patch only needs to give -sourceClusterConf <source_cluser_conf_file> in
command line. It's non-invasive and easy to understand. Maybe my patch cannot
be adopted by open source at last. But I think its still could help some hadoop
end user to solve this issue for easy use.
I supplemented the instructions for -sourceClusterConf
<source_cluser_conf_file> parameters in DistCp.md.vm.
The HDFS-9868.1.patch is based on hadoop-2.7.1.
The HDFS-9868.2.patch is based on hadoop trunk that is hadoop-3.0.0.
> add reading source cluster with HA access mode feature for DistCp
> -----------------------------------------------------------------
>
> Key: HDFS-9868
> URL: https://issues.apache.org/jira/browse/HDFS-9868
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: distcp
> Affects Versions: 2.7.1
> Reporter: NING DING
> Assignee: NING DING
> Attachments: HDFS-9868.1.patch, HDFS-9868.2.patch
>
>
> Normally the HDFS cluster is HA enabled. It could take a long time when
> coping huge data by distp. If the source cluster changes active namenode, the
> distp will run failed. This patch supports the DistCp can read source cluster
> files in HA access mode. A source cluster configuration file needs to be
> specified (via the -sourceClusterConf option).
> The following is an example of the contents of a source cluster
> configuration
> file:
> {code:xml}
> <configuration>
> <property>
> <name>fs.defaultFS</name>
> <value>hdfs://mycluster</value>
> </property>
> <property>
> <name>dfs.nameservices</name>
> <value>mycluster</value>
> </property>
> <property>
> <name>dfs.ha.namenodes.mycluster</name>
> <value>nn1,nn2</value>
> </property>
> <property>
> <name>dfs.namenode.rpc-address.mycluster.nn1</name>
> <value>host1:9000</value>
> </property>
> <property>
> <name>dfs.namenode.rpc-address.mycluster.nn2</name>
> <value>host2:9000</value>
> </property>
> <property>
> <name>dfs.namenode.http-address.mycluster.nn1</name>
> <value>host1:50070</value>
> </property>
> <property>
> <name>dfs.namenode.http-address.mycluster.nn2</name>
> <value>host2:50070</value>
> </property>
> <property>
> <name>dfs.client.failover.proxy.provider.mycluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> </configuration>
> {code}
> The invocation of DistCp is as below:
> {code}
> bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar
> hdfs://nn2:8020/bar/foo
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)