Hi,

hexrat wrote:
I am looking at Hadoop as a platform for performing some google-like
map/reduce programs.  One thing I am not understanding is how the machines
come into the cluster after processing has begun.  It appears the machines
in the cluster are configured up front and immutable.  Is this so?


Just starting up a DataNode/TaskTracker on a new machine and pointing it (via right configuration) to the correct NameNode/JobTracker respectively ensures that you have more machines in the data/compute cluster. This can be done at any point and the slaves (DN/TT) don't have to be known apriori i.e. the framework doesn't assume that all known machines are started at time-zero.

My understanding of the google architecture is that if one or more machines
fail, the job scheduler just brings additional machines into the cluster and
assigns them tasks.  How does this occur in Hadoop since the machines must
be specified by config up front.  Am I understanding the architecture
accurately?  Thanks in advance.

You would need a resource scheduler like Hadoop on Demand (HoD - http://issues.apache.org/jira/browse/HADOOP-1301) to monitor and increase/decrease the size of the cluster. The hadoop framework itself doesn't handle the part of 'bringing additional machines' when machines fail.

Arun

Reply via email to