Jeff Hammerbacher wrote:
has anyone leveraged the ability of datanodes to specify which datacenter
and rack they live in?  if so, any evidence of performance improvements?  it
seems that rack-awareness is only leveraged in block replication, not in
task execution.

It often doesn't make a big improvement for map input, since in the common configuration, map tasks can nearly always be scheduled on nodes where the data is local. However, if you have a large HDFS cluster and overlay smaller mapreduce clusters over subsets of the hosts, then rack-locality can help map input performance too.

Doug

Reply via email to