[
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428412#comment-13428412
]
Siddharth Seth commented on MAPREDUCE-3902:
-------------------------------------------
@Tsuyoshi; I'd spoken with Vinod and others about this a while ago. Should have
posted this earlier.. Adding the functionality to the AM in the current state
is possible - but will further complicate some components which are already
quite complicated - and tough to change.
The TaskAttempt state machine is currently really a mix of TaskAttempt
transitions as well as Container transitions. The RMContaienrAllocator is also
dealing with more than it should - Nodes, Containers as well as scheduling.
The idea was to split the functionality into a separate TaskAttempt, Container
and Node state machine, along with reduced functionality in the scheduler (also
decoupling the RM request and AM scheduling). This would make the code cleaner
and make re-use (as well as other improvements like handling retired nodes)
easier to implement.
Had worked with Vinod on the state transitions, and have been working on the
implementation in bits and pieces to see how feasible it is. The code is at
https://github.com/sidseth/h2-container-reuse . It's a little bit of a mess at
the moment, with lots of TODOs, etc splattered all over, but is just about
functional. There's no explicit re-use scheduling yet - but re-use can be
tested by running a job which requires more containers than available on the
cluster (and some config changes).
bq. the 2nd topic(combining per container) should be moved, because the change
seems to be too big.
I believe this was, at least initially, meant to ensure that output from all
taskAttempts in one container, would be fetched only once by a reducer (without
a common combiner). Either way, that could be a separate jira.
> MR AM should reuse containers for map tasks, there-by allowing fine-grained
> control on num-maps for users without need for CombineFileInputFormat etc.
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: applicationmaster, mrv2
> Reporter: Arun C Murthy
> Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks.
> This is something similar to JVM re-use we had in 0.20.x, but in a
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole
> container at once (i.e. all maps)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira