Hi, I am trying to understand more about Hadoop Next Gen Map Reduce and had the following questions based on the following post:
http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/ [1] How does application decide how many containers it needs? The containers are used to store the intermediate result at the map nodes? [2] During resource allocation, if the resource manager has no mapping between map tasks to resources allocated, how can it properly allocate the right resources. It might end up allocating resources on a node, which does not have data for the map task, and hence is not optimal. In this case the Application Master will have to reject it and request again . There could be considerable back- and- forth between application master and resource manager before it could converge. Is this right? Thanks!