[
https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887273#comment-15887273
]
Yongjun Zhang commented on HDFS-9868:
-------------------------------------
Thanks [~xiaochen]. I just committed HADOOP-14127.
Found a better way to distribute the conf files with DistributedCache, we could
use
{code}
public void addCacheArchive(URI uri)
{code}
, if we create a tar file out of the conf dir, and use this api to send the tar
file to the distributed cache, then the same tarred dir hierarchy will be
extracted and available at the current working directory.
See
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html
> Add ability for DistCp to run between 2 clusters
> ------------------------------------------------
>
> Key: HDFS-9868
> URL: https://issues.apache.org/jira/browse/HDFS-9868
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: distcp
> Affects Versions: 2.7.1
> Reporter: NING DING
> Assignee: NING DING
> Attachments: HDFS-9868.05.patch, HDFS-9868.06.patch,
> HDFS-9868.07.patch, HDFS-9868.08.patch, HDFS-9868.09.patch,
> HDFS-9868.10.patch, HDFS-9868.1.patch, HDFS-9868.2.patch, HDFS-9868.3.patch,
> HDFS-9868.4.patch
>
>
> Normally the HDFS cluster is HA enabled. It could take a long time when
> coping huge data by distp. If the source cluster changes active namenode, the
> distp will run failed. This patch supports the DistCp can read source cluster
> files in HA access mode. A source cluster configuration file needs to be
> specified (via the -sourceClusterConf option).
> The following is an example of the contents of a source cluster
> configuration
> file:
> {code:xml}
> <configuration>
> <property>
> <name>fs.defaultFS</name>
> <value>hdfs://mycluster</value>
> </property>
> <property>
> <name>dfs.nameservices</name>
> <value>mycluster</value>
> </property>
> <property>
> <name>dfs.ha.namenodes.mycluster</name>
> <value>nn1,nn2</value>
> </property>
> <property>
> <name>dfs.namenode.rpc-address.mycluster.nn1</name>
> <value>host1:9000</value>
> </property>
> <property>
> <name>dfs.namenode.rpc-address.mycluster.nn2</name>
> <value>host2:9000</value>
> </property>
> <property>
> <name>dfs.namenode.http-address.mycluster.nn1</name>
> <value>host1:50070</value>
> </property>
> <property>
> <name>dfs.namenode.http-address.mycluster.nn2</name>
> <value>host2:50070</value>
> </property>
> <property>
> <name>dfs.client.failover.proxy.provider.mycluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> </configuration>
> {code}
> The invocation of DistCp is as below:
> {code}
> bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar
> hdfs://nn2:8020/bar/foo
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]