Re: Dynamic machines within Hadoop cluster

Arun C Murthy Wed, 19 Sep 2007 22:28:26 -0700

Hi,

hexrat wrote:

I am looking at Hadoop as a platform for performing some google-like
map/reduce programs.  One thing I am not understanding is how the machines
come into the cluster after processing has begun.  It appears the machines
in the cluster are configured up front and immutable.  Is this so?

Just starting up a DataNode/TaskTracker on a new machine and pointing it(via right configuration) to the correct NameNode/JobTrackerrespectively ensures that you have more machines in the data/computecluster. This can be done at any point and the slaves (DN/TT) don't haveto be known apriori i.e. the framework doesn't assume that all knownmachines are started at time-zero.

My understanding of the google architecture is that if one or more machines
fail, the job scheduler just brings additional machines into the cluster and
assigns them tasks.  How does this occur in Hadoop since the machines must
be specified by config up front.  Am I understanding the architecture
accurately?  Thanks in advance.

You would need a resource scheduler like Hadoop on Demand (HoD -http://issues.apache.org/jira/browse/HADOOP-1301) to monitor andincrease/decrease the size of the cluster. The hadoop framework itselfdoesn't handle the part of 'bringing additional machines' when machinesfail.


Arun

Re: Dynamic machines within Hadoop cluster

Reply via email to