Task location determination

Bai Shen Wed, 04 Jan 2012 06:34:50 -0800

I have a test Hadoop cluster set up using Cloudera.  It consists of the
Name Node and three Data Nodes.  When I submit jobs, they end up piling up
on one node instead of round robining through the different nodes.


I understand that Hadoop tries to run the job where the data is located,
but with only three data nodes and a replication factor of 3, wouldn't that
mean that the same data is on every single machine?  Why would it not
spread the tasking out over all of the machines instead of clumping up on
one, leaving the others idle?

Thanks.

Task location determination

Reply via email to