Charles, If I understand you correctly you want to trim the cluster down to only those machines that you control...
Ok... Do you care about the data that is currently on the cluster? (Is all of the data yours, or replaceable?) Can you easily copy the data off the cluster on to plain old unix file system disk space? If not, then you have to do the following on a NODE by NODE Basis... A) Put the node in the dfs.exlude file and remove from the slaves file. B) As root run killall -9 java to stop any java from running. (It will end your datanode and tasktracker jobs.) C) Wait 10 mins until the job tracker and name node see the node as down. D) Run a hadoop fsck / to find all of the files that are now missing a replication. E) Run balancer to replicate the missing blocks on a different machine. Of course, it would help if you upped the bandwidth used by the balancer to a large number. Normally the balancer is supposed to run in the background, so by default its something like 1 MB/sec. If you've got a 1GB ethernet link, you could easily push that number up to 100 or 200 MB/sec. Then when you run the balancer, it moves! Note: When we tried decommissioning nodes, I don't know if we had changed this parameter, but it was taking 'weeks' to decommission a node. (Your Mileage May Vary). Not sure if the long time was due to this parameter being so low, or something else. What I listed above should work. (Even if it is a bit ugly.) HTH -Mike ________________________________________ From: Charles Gonçalves [[email protected]] Sent: Monday, January 17, 2011 7:07 PM To: [email protected] Subject: Manage a cluster where not all machines are always available Hi Guys, I'm running a series of pig scripts in a cluster with a dozen of machines. The problem is that those machines belongs to a lab in my University and sometimes not all them are available for my use. What is the best approach to manage the configuration and the data on hdfs on this enviroment? Can I simply remove the busy servers from the slaves file and start the hdfs and mapred and if needed perform a : hadoop balancer Can you see a problem in this approach ? Can anyone see another way!? -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG - ICEx - Dcc Cel.: 55 31 87741485 Tel.: 55 31 34741485 Lab.: 55 31 34095840 The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
