Thanks! I was looking at the link sent by Philip. The copy is done with the
following command:
hadoop distcp hdfs://nn1:8020/foo/bar \
hdfs://nn2:8020/bar/foo
I was wondering if nn1 and nn2 are the names of the clusters or the name of
the masters on each cluster.
I wanted map/reduce tasks running on each of the two clusters to communicate
with each other. I dont know if hadoop provides for synchronization between
two map/reduce tasks. The tasks run simultaneouly, and they need to access a
common file - something like a map/reduce task at a higher level utilizing
the data produced by the map/reduce at the lower level.
Mithila
On Tue, Apr 7, 2009 at 7:57 AM, Owen O'Malley <[email protected]> wrote:
>
> On Apr 6, 2009, at 9:49 PM, Mithila Nagendra wrote:
>
> Hey all
>> I'm trying to connect two separate Hadoop clusters. Is it possible to do
>> so?
>> I need data to be shuttled back and forth between the two clusters. Any
>> suggestions?
>>
>
> You should use hadoop distcp. It is a map/reduce program that copies data,
> typically from one cluster to another. If you have the hftp interface
> enabled, you can use that to copy between hdfs clusters that are different
> versions.
>
> hadoop distcp hftp://namenode1:1234/foo/bar hdfs://foo/bar
>
> -- Owen
>