Jeff Hammerbacher wrote:
thanks for the quick reply.  this is an interesting scenario that you bring
up: a large hdfs pool across data centers with, perhaps, a jobtracker per
data center (or a jobtracker per rack).  i'm still not clear how
rack-locality helps map input performance here;

You could do it that way, however I was imagining striping sub-clusters across racks, so that, say each of 4 sub-clusters contain 25% of the hosts in each rack.

JobInProgress.findNewTaskseems to see the world as local or other,
with no rack-awareness.  are you
referring to the ReplicationTargetChooser, which will always put one of the
three replicas (assuming your replication level is 3) on your rack, hence
increasing your chances of finding the block within your sub-jobtracker net?

That works too, but I was thinking that, even if a host in your cluster does not contain a block, a map task could be placed on a host in a rack that contains the block, so the map input would be rack-local.

Doug

Reply via email to