Re: rack-awareness for hdfs

Owen O'Malley Tue, 18 Sep 2007 09:39:06 -0700

On Sep 18, 2007, at 9:28 AM, Ted Dunning wrote:


The key here is that the task farm need not coincide exactly with the
storage farm.

On a large run with an identical hdfs/mapreduce cluster, we see veryhigh (95%) mapper locality. However, it is usual case that the hdfscluster is larger than the map/reduce cluster and so it would be goodto make the map placement rack-aware and that is a recognized goal.


There are a couple of issues with the goal:

1. The network topology is currently hdfs centric and needs to begeneralized. There is a jira for this.2. The filesystem interface needs to provide rack and nodeplacement information.3. The input split interface needs to be generalized to deal withracks as well as nodes.4. The job tracker needs to use the rack information to utilizethe rack information.

It is not on my short term radar, but it is on the medium term radar.However, patches are welcome! *smile*


-- Owen

Re: rack-awareness for hdfs

Reply via email to