[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645751#comment-13645751
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4584:
-------------------------------------------

I agree with your strategy. I'm working in MAPREDUCE-4502, a related work of 
yours, however the patch become too large to review. Now I've planed to split 
the patches, but the change of your work affects my work. Therefore, I'd like 
to work with your strategy.  Essentialy, your proposal and the node-level 
map-side aggregation(MAPREDUCE-4502) are complement each other, therefore the 
impact on performance can get much better if all features are included in 
MapReduce.

One proposal is: using node-level aggregation as an optimization technique of 
reducer-side preemption. If a lot of IFiles are needed to fetch and the job is 
an aggregation type, mapper-side aggregation is more effective to reduce the 
size of fetching than fetching in parallel by using reducer preemption. 
Cooperating these features or switching strategy is possible. Any idea?
                
> Umbrella: Preemption and restart of MapReduce tasks
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-4584
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4584
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2, performance, resourcemanager, 
> task
>            Reporter: Sriram Rao
>            Assignee: Chris Douglas
>
> This JIRA will track the implementation of improvements to the handling of 
> intermediate data (e.g., map output). Specifically, it tracks changes in 
> support of preempting running tasks, checkpointing completed work, and 
> spawning one or more tasks to complete the original split/partition. These 
> mechanisms allow one to manage skew in intermediate data, respond to resource 
> abundance or scarcity (particularly with preemption), speculatively execute 
> on the remaining work from checkpointed tasks, and automatically tune 
> parameters for performance.
> Iterations will build on learnings from previous work, including the 
> following:
> Technical reports:
> http://research.yahoo.com/files/yl-2012-002.pdf
> http://research.yahoo.com/files/yl-2012-003.pdf
> Source code:
> http://code.google.com/p/sailfish

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to