[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851983#action_12851983
 ] 

Scott Chen commented on MAPREDUCE-1463:
---------------------------------------

I think improving the timing for launching reducers is not just for small jobs.
In the case of FairSchduler, for larger jobs with 10000+ mappers, the mappers 
needs several batches to be fully scheduled.
In this case if we launch the reducer when 5% mapper finished, those reducers 
will just be idling.

Here is the trade-off.
If we launch the reducer too late, we lose the parallel execution for the 
mapper execution and reducer shuffling.
But if we launch the reducer too early, we waste the reducer slots because they 
have to wait the mappers to finish.

The optimal case for this is that we launch the reducers as late as possible 
while the reducer shuffling phase finishes right after the last mapper finished.

The goal is to somehow estimate the mapper finish time based on the information 
we have and launch the reducers at the right moment.
I think this decision should depend on TaskScheduler because different 
scheduling policy affects the mapper finish time.

Thoughts?

> Reducer should start faster for smaller jobs
> --------------------------------------------
>
>                 Key: MAPREDUCE-1463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
> MAPREDUCE-1463-v3.patch
>
>
> Our users often complain about the slowness of smaller ad-hoc jobs.
> The overhead to wait for the reducers to start in this case is significant.
> It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to