Re: rack-awareness for hdfs

Doug Cutting Tue, 18 Sep 2007 08:37:48 -0700

Jeff Hammerbacher wrote:

thanks for the quick reply.  this is an interesting scenario that you bring
up: a large hdfs pool across data centers with, perhaps, a jobtracker per
data center (or a jobtracker per rack).  i'm still not clear how
rack-locality helps map input performance here;

You could do it that way, however I was imagining striping sub-clustersacross racks, so that, say each of 4 sub-clusters contain 25% of thehosts in each rack.

JobInProgress.findNewTaskseems to see the world as local or other,
with no rack-awareness.  are you
referring to the ReplicationTargetChooser, which will always put one of the
three replicas (assuming your replication level is 3) on your rack, hence
increasing your chances of finding the block within your sub-jobtracker net?

That works too, but I was thinking that, even if a host in your clusterdoes not contain a block, a map task could be placed on a host in a rackthat contains the block, so the map input would be rack-local.


Doug

Re: rack-awareness for hdfs

Reply via email to