I used to think that the notion of rack would be useful to exploit in a rack-level combiner (aggregate data before shipping off rack)
But apparently Goog doesn't do this (at least what some people told me). Any thoughts on this list? -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, September 17, 2007 8:45 PM To: [email protected] Subject: Re: rack-awareness for hdfs Jeff Hammerbacher wrote: > has anyone leveraged the ability of datanodes to specify which datacenter > and rack they live in? if so, any evidence of performance improvements? it > seems that rack-awareness is only leveraged in block replication, not in > task execution. It often doesn't make a big improvement for map input, since in the common configuration, map tasks can nearly always be scheduled on nodes where the data is local. However, if you have a large HDFS cluster and overlay smaller mapreduce clusters over subsets of the hosts, then rack-locality can help map input performance too. Doug
