I used to think that the notion of rack would be useful to exploit in a
rack-level combiner (aggregate data before shipping off rack)

But apparently Goog doesn't do this (at least what some people told me).
Any thoughts on this list? 


-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 17, 2007 8:45 PM
To: [email protected]
Subject: Re: rack-awareness for hdfs

Jeff Hammerbacher wrote:
> has anyone leveraged the ability of datanodes to specify which
datacenter
> and rack they live in?  if so, any evidence of performance
improvements?  it
> seems that rack-awareness is only leveraged in block replication, not
in
> task execution.

It often doesn't make a big improvement for map input, since in the 
common configuration, map tasks can nearly always be scheduled on nodes 
where the data is local.  However, if you have a large HDFS cluster and 
overlay smaller mapreduce clusters over subsets of the hosts, then 
rack-locality can help map input performance too.

Doug

Reply via email to