Online, you can copy files/folders to a new location, then delete the original and rename. The new data will be uniformly distributed. Distcp is useful for copying around large amounts of data. Offline, you can move dfs block files from old machines to the new machine (scp src:<dfs.data.dir>/blk_????? dst:<dfs.data.dir>). On startup, each datanode will report its blocks to the namenode and the namenode will make sense of it all.
There's no fancy method of rebalancing, let alone proportional block assignment. Yoram > -----Original Message----- > From: David Pollak [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 25, 2006 9:35 AM > To: [email protected] > Subject: Rebalancing a DFS cluster > > Howdy, > > I've got a DFS cluster. I added a new machine to my cluster. The > new machine is the fastest in the cluster (a Core 2 Duo E6600 which > blows every machine I've ever used out of the water... but I > digress.) I'd like to rebalance some of the files in my DFS cluster > so this machine has files on its local filesystem. Is there > a way to > tell DFS to rebalance and (okay this is a wish-list item) put a > "speed factor" on each of the slaves so that faster machines > will get > more data on their local drives so the machines that run faster are > more likely to have local data. > > Thanks, > > David > > >
