I guess I got the point. RM doesn't have the mapping information from a task attempt to its working file's hosts. If the AM only send one resource request for the attempt with one of the replica's location, and if unfortunately RM cannot schedule a most appropriate container for the attempt due to some kinds of busyness of the target node. In such scenario, the scheduler would provide resources on the same rack or a different rack, which is also means the locality is lost.
However, a better alternative is try another replica's host of the working file. So the transitional program is that submits 5 resource requests at the same time, then reverses the best allocation, releases the others with the help of a <attempt id, container request> map. Thanks, Min On Sat, Jun 16, 2012 at 11:09 PM, Min Zhou <coderp...@gmail.com> wrote: > Hi, all, > > RMContainerRequestor creates more than one resource requests for each task > attempt. > If we have a hdfs file with a 3 replicas, on which a task attempt would > runs in a > one rack only cluster , the code below would create 3 (hosts) + 1 (rack) > + 1 (any) = 5 > container request for the attempt. > > protected void addContainerReq(ContainerRequest req) { > // Create resource requests > for (String host : req.hosts) { > // Data-local > if (!isNodeBlacklisted(host)) { > addResourceRequest(req.priority, host, req.capability); > } > } > > // Nothing Rack-local for now > for (String rack : req.racks) { > addResourceRequest(req.priority, rack, req.capability); > } > > // Off-switch > addResourceRequest(req.priority, ANY, req.capability); > } > > IMHO, that should be only one for each attempt. Would MRAppMaster launch > 5 containers for it? > How MRv2 guarantees that would be only one for the attempt? > Can any kindly explain the logic on this point for me? > > Thanks, > Min > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com