[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

Scott Chen (JIRA) Tue, 09 Feb 2010 17:10:53 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831797#action_12831797
 ]


Scott Chen commented on MAPREDUCE-1463:
---------------------------------------

@Todd: 
Yes, you're right. The logic in the patch is wrong. The one you post is the 
correct logic. Sorry about the mistake.

@Amar: 
{quote}
How do you define small jobs. Shouldnt it be based on total number of tasks 
instead of considering maps and reduces individually?
{quote}
We want to start reducer faster in both the fewer mapper and fewer reducer 
cases.
Because for fewer reducer case, starting reducer earlier is cheap anyway. And 
for fewer mapper case, the mapper finishes faster.
But I think it may not be a bad idea if we take the total instead (it is 
simpler at least). 
{quote}
Why do we need special case for small jobs? If its for fairness then this piece 
of code rightly belongs to contrib/fairscheduler, no?
If not for fairness then what is the problem with the current framework w.r.t 
small jobs?
{quote}
Handling the special case for small jobs increase the overall latency which 
gives the users better experience.
{quote}
Can be fixed by simple (configuration-like) tweaking?
If not then whats the right fix.
{quote}
For experienced users,  setting completedmaps=0 does fix this problem. But it 
will be nice if this can be automatically done for other users who do not know 
how to configure hadoop.


@Arun: 
Thanks for the comments. I agree. Tweaking 
mapreduce.job.reduce.slowstart.completedmaps in the job client side should be a 
cleaner way for this one. For experienced users, settting completedmaps to 0 in 
the client side will make their small jobs finish faster.  But it would be nice 
if some automatic decision can be done here such that the normal users don't 
have to learn how to configure an extra parameter.


The point here is that for some cases (small job, small number of mappers or 
reducers) we should not be spending time on waiting the reducers to start 
because the waiting time is significant (or it is cheap to start the reducer 
earlier). Automatically reducing the latency makes our user happy.

> Reducer should start faster for smaller jobs
> --------------------------------------------
>
>                 Key: MAPREDUCE-1463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/fair-share
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch
>
>
> Our users often complain about the slowness of smaller ad-hoc jobs.
> The overhead to wait for the reducers to start in this case is significant.
> It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

Reply via email to