Re: Changing hostnames of tasktracker/datanode nodes - any problems?

Allen Wittenauer Tue, 10 Aug 2010 11:02:16 -0700

On Aug 10, 2010, at 10:54 AM, Bill Graham wrote:
> Is is correct to say that that would work fine? We have a replication factor
> of 2, so we'd be copying twice as much data as we'd need to so I'm sure
> there's a more efficient approach.


It should work fine.  But yes, highly inefficient.

> What about adding the new nodes in the new colo to the existing cluster,
> rebalancing and then decommissioning the old cluster nodes before finally
> migrating the NN/SNN? I know Hadoop isn't intended to run cross-colo, but
> would this be a more efficient approach than the one above?

If you can keep both grids up at the same time, use distcp to do the copy.  
This will make sure the blocks get copied once, will keep permissions with -p, 
keep the replication factor, redistribute data (free balancing!), etc, etc, etc.

Re: Changing hostnames of tasktracker/datanode nodes - any problems?

Reply via email to