> 1. The network topology is currently hdfs centric and needs to be generalized. There is a jira for this. This is no longer an issue. Hadoop-1266 has a patch that's committed.
Hairong -----Original Message----- From: Owen O'Malley [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 18, 2007 9:38 AM To: [email protected] Subject: Re: rack-awareness for hdfs On Sep 18, 2007, at 9:28 AM, Ted Dunning wrote: > > The key here is that the task farm need not coincide exactly with the > storage farm. On a large run with an identical hdfs/mapreduce cluster, we see very high (95%) mapper locality. However, it is usual case that the hdfs cluster is larger than the map/reduce cluster and so it would be good to make the map placement rack-aware and that is a recognized goal. There are a couple of issues with the goal: 1. The network topology is currently hdfs centric and needs to be generalized. There is a jira for this. 2. The filesystem interface needs to provide rack and node placement information. 3. The input split interface needs to be generalized to deal with racks as well as nodes. 4. The job tracker needs to use the rack information to utilize the rack information. It is not on my short term radar, but it is on the medium term radar. However, patches are welcome! *smile* -- Owen
