[ 
https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172001#comment-15172001
 ] 

NING DING commented on HDFS-9868:
---------------------------------

[~qwertymaniac], thanks for pointing out HDFS-6376. I didn't find that patch 
before.
Yes, the patch HDFS-6376 and mine solved the same issue. But their solutions 
are different.
I'm using Hadoop-2.7.1, but I cannot find the dfs.internal.nameservices 
configuration property description in any hadoop doc.
I only can find it's defined in hdfs-default.xml. From its description in 
hdfs-default.xml it's hard to associate it with DistCp reading HA cluster file 
issue.
I suggest it should be mentioned in DistCp guide and show an example. But if I 
missed some doc mentioned it, please tell me. Thanks.
The DistCp is a tool and in fact it's just a mapreduce program.
To use HDFS-6376 patch, the local cluster must update the 
dfs.internal.nameservices and dfs.nameservices configuration values.
In DistCp job, the map task is hadoop client to read the source HA cluster 
files. So all the nodemanager running environment must set 
dfs.internal.nameservices and dfs.nameservices values correctly. 
I think it's not very appropriate to update the cluster configuration for 
running a MR job.
I'm just a hadoop cluster end user, not the cluster administrator. 
For running DistCp, I have to ask the administrator to change hadoop 
configuration. It may not very appropriate.
So I think my HDFS-9868 patch could be an alternative option to solve DistCp 
reading HA cluster file issue.
My patch only needs to give -sourceClusterConf <source_cluser_conf_file> in 
command line. It's non-invasive and easy to understand. Maybe my patch cannot 
be adopted by open source at last. But I think its still could help some hadoop 
end user to solve this issue for easy use.
I supplemented the instructions for -sourceClusterConf 
<source_cluser_conf_file> parameters in DistCp.md.vm.
The HDFS-9868.1.patch is based on hadoop-2.7.1.
The HDFS-9868.2.patch is based on hadoop trunk that is hadoop-3.0.0.

> add reading source cluster with HA access mode feature for DistCp
> -----------------------------------------------------------------
>
>                 Key: HDFS-9868
>                 URL: https://issues.apache.org/jira/browse/HDFS-9868
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: distcp
>    Affects Versions: 2.7.1
>            Reporter: NING DING
>            Assignee: NING DING
>         Attachments: HDFS-9868.1.patch, HDFS-9868.2.patch
>
>
> Normally the HDFS cluster is HA enabled. It could take a long time when 
> coping huge data by distp. If the source cluster changes active namenode, the 
> distp will run failed. This patch supports the DistCp can read source cluster 
> files in HA access mode. A source cluster configuration file needs to be 
> specified (via the -sourceClusterConf option).
>   The following is an example of the contents of a source cluster 
> configuration
>   file:
> {code:xml}
>     <configuration>
>       <property>
>               <name>fs.defaultFS</name>
>               <value>hdfs://mycluster</value>
>         </property>
>         <property>
>               <name>dfs.nameservices</name>
>               <value>mycluster</value>
>         </property>
>         <property>
>               <name>dfs.ha.namenodes.mycluster</name>
>               <value>nn1,nn2</value>
>         </property>
>         <property>
>               <name>dfs.namenode.rpc-address.mycluster.nn1</name>
>               <value>host1:9000</value>
>         </property>
>         <property>
>               <name>dfs.namenode.rpc-address.mycluster.nn2</name>
>               <value>host2:9000</value>
>         </property>
>         <property>
>               <name>dfs.namenode.http-address.mycluster.nn1</name>
>               <value>host1:50070</value>
>         </property>
>         <property>
>               <name>dfs.namenode.http-address.mycluster.nn2</name>
>               <value>host2:50070</value>
>         </property>
>         <property>
>               <name>dfs.client.failover.proxy.provider.mycluster</name>
>               
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>         </property>
>       </configuration>
> {code}
>   The invocation of DistCp is as below:
> {code}
>     bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar 
> hdfs://nn2:8020/bar/foo
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to