[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834371#action_12834371
 ] 

Scott Chen commented on MAPREDUCE-1463:
---------------------------------------

@Amar: Sorry for the late reply. I have just got back from vacation. About your 
long running mapper argument I think you are right. Using task counts is not 
sufficient. Maybe we need more information than task counts to determine when 
to delay the reducers. Can you give me some suggestions? Setting 
mapreduce.job.reduce.slowstart.completedmaps to zero does increase the latency. 
But it hurts the reducer utilization.

I think the trade-off here is that we want to delay the reducers to increase 
the reducer utilization but we also want to minimize the impact of this delay 
for smaller jobs because this delay is significant for smaller jobs but is OK 
for large jobs. So these two cases should be treated differently. There should 
be a way to balance the reducer utilization and small job latency, thoughts?

> Reducer should start faster for smaller jobs
> --------------------------------------------
>
>                 Key: MAPREDUCE-1463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/fair-share
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
> MAPREDUCE-1463-v3.patch
>
>
> Our users often complain about the slowness of smaller ad-hoc jobs.
> The overhead to wait for the reducers to start in this case is significant.
> It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to