RE: Manage a cluster where not all machines are always available

Segel, Mike Mon, 17 Jan 2011 17:34:02 -0800

Charles,

If I understand you correctly you want to trim the cluster down to only those 
machines that you control...


Ok... Do you care about the data that is currently on the cluster? 
(Is all of the data yours, or replaceable?)

Can you easily copy the data off the cluster on to plain old unix file system 
disk space?

If not, then you have to do the following on a NODE by NODE Basis...
A) Put the node in the dfs.exlude file and remove from the slaves file.
B) As root run killall -9 java to stop any java from running. (It will end your 
datanode and tasktracker jobs.)
C) Wait 10 mins until the job tracker and name node see the node as down.
D) Run a hadoop fsck / to find all of the files that are now missing a 
replication.
E) Run balancer to replicate the missing blocks on a different machine.

Of course, it would help if you upped the bandwidth used by the balancer to a 
large number. Normally the balancer is supposed to run in the background, so by 
default its something like 1 MB/sec. If you've got a 1GB ethernet link, you 
could easily push that number up to 100 or 200 MB/sec. Then when you run the 
balancer, it moves!

Note: When we tried decommissioning nodes, I don't know if we had changed this 
parameter, but it was taking 'weeks' to decommission a node. (Your Mileage May 
Vary). Not sure if the long time was due to this parameter being so low, or 
something else. 

What I listed above should work. (Even if it is a bit ugly.)

HTH

-Mike

________________________________________
From: Charles Gonçalves [[email protected]]
Sent: Monday, January 17, 2011 7:07 PM
To: [email protected]
Subject: Manage a cluster where not all machines are always available

Hi Guys,

I'm running a series of pig scripts in a cluster with a dozen of machines.
The problem is that those machines belongs to a lab in my University and
sometimes not all them are available for my use.
What is the best approach to manage the configuration and the data on hdfs
on this enviroment?

Can I simply remove the busy servers from the slaves file and start the hdfs
and mapred  and if needed perform a :
hadoop balancer

Can you see a problem in this approach ?
Can anyone see another way!?




--
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840


The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

RE: Manage a cluster where not all machines are always available

Reply via email to