[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428412#comment-13428412
 ] 

Siddharth Seth commented on MAPREDUCE-3902:
-------------------------------------------

@Tsuyoshi; I'd spoken with Vinod and others about this a while ago. Should have 
posted this earlier.. Adding the functionality to the AM in the current state 
is possible - but will further complicate some components which are already 
quite complicated - and tough to change.

The TaskAttempt state machine is currently really a mix of TaskAttempt 
transitions as well as Container transitions. The RMContaienrAllocator is also 
dealing with more than it should - Nodes, Containers as well as scheduling. 

The idea was to split the functionality into a separate TaskAttempt, Container 
and Node state machine, along with reduced functionality in the scheduler (also 
decoupling the RM request and AM scheduling). This would make the code cleaner 
and make re-use (as well as other improvements like handling retired nodes) 
easier to implement.

Had worked with Vinod on the state transitions, and have been working on the 
implementation in bits and pieces to see how feasible it is. The code is at 
https://github.com/sidseth/h2-container-reuse . It's a little bit of a mess at 
the moment, with lots of TODOs, etc splattered all over, but is just about 
functional. There's no explicit re-use scheduling yet - but re-use can be 
tested by running a job which requires more containers than available on the 
cluster (and some config changes).

bq. the 2nd topic(combining per container) should be moved, because the change 
seems to be too big.
I believe this was, at least initially, meant to ensure that output from all 
taskAttempts in one container, would be fetched only once by a reducer (without 
a common combiner). Either way, that could be a separate jira.
                
> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3902
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster, mrv2
>            Reporter: Arun C Murthy
>            Assignee: Siddharth Seth
>         Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to