[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452640#comment-13452640
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4502:
-------------------------------------------

Karthik, thanks for your comment.

bq. Is the local aggregation done asynchronously as the mappers process 
respective input?

Partially, yes.  After at least two map tasks finishing, local aggregation can 
be done.

bq. Will there be one LocalAggregator and one ShuffleHandler per each reducer? 
Or, is it a single LocalAggregator/ShuffleHandler daemon with relevant 
thread(pool)s per container?

Latter.

It's ideal to minimize code modifications and maximize the performance. At the 
current MR implementation, a ShuffleHandler is launched per container. Keeping 
it so can save the code modification. 
BTW, multi-threaded LocalAggregator is very effective for performance, however, 
the implementation can be more complex than single-threaded one. As a first 
step, it's reasonable to implement single-thread version.

bq. The current design doc seems to be aimed at aggregation per container. The 
bigger goal being aggregation and node/rack levels, does the same design 
extend/apply to the final goal?

I thought node-level aggregation and container-level aggregation in MR 2.0 are 
exactly same.

To make this design more generic to support rack-level aggregation, a special 
task like Reducer which can fetch outputs and reduce them, but write its 
outputs not to HDFS but to local disk is necessary. With the special task, it 
can be used in rack-level aggregation by extending the new APIs between mappers 
and reducers to launch special tasks and delegate the aggregation.

Please ask me if you have any questions.
- Tsuyoshi
                
> Multi-level aggregation with combining the result of maps per node/rack
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4502
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster, mrv2
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>         Attachments: speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of 
> combiner, because the scope of combining is limited within only one MapTask. 
> To solve this problem, it's a good way to aggregate the result of maps per 
> node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, 
> including combining per container(MAPREDUCE-3902 is related), coordinating 
> containers by application master without breaking fault tolerance of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to