[ https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831797#action_12831797 ]
Scott Chen commented on MAPREDUCE-1463: --------------------------------------- @Todd: Yes, you're right. The logic in the patch is wrong. The one you post is the correct logic. Sorry about the mistake. @Amar: {quote} How do you define small jobs. Shouldnt it be based on total number of tasks instead of considering maps and reduces individually? {quote} We want to start reducer faster in both the fewer mapper and fewer reducer cases. Because for fewer reducer case, starting reducer earlier is cheap anyway. And for fewer mapper case, the mapper finishes faster. But I think it may not be a bad idea if we take the total instead (it is simpler at least). {quote} Why do we need special case for small jobs? If its for fairness then this piece of code rightly belongs to contrib/fairscheduler, no? If not for fairness then what is the problem with the current framework w.r.t small jobs? {quote} Handling the special case for small jobs increase the overall latency which gives the users better experience. {quote} Can be fixed by simple (configuration-like) tweaking? If not then whats the right fix. {quote} For experienced users, setting completedmaps=0 does fix this problem. But it will be nice if this can be automatically done for other users who do not know how to configure hadoop. @Arun: Thanks for the comments. I agree. Tweaking mapreduce.job.reduce.slowstart.completedmaps in the job client side should be a cleaner way for this one. For experienced users, settting completedmaps to 0 in the client side will make their small jobs finish faster. But it would be nice if some automatic decision can be done here such that the normal users don't have to learn how to configure an extra parameter. The point here is that for some cases (small job, small number of mappers or reducers) we should not be spending time on waiting the reducers to start because the waiting time is significant (or it is cheap to start the reducer earlier). Automatically reducing the latency makes our user happy. > Reducer should start faster for smaller jobs > -------------------------------------------- > > Key: MAPREDUCE-1463 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/fair-share > Reporter: Scott Chen > Assignee: Scott Chen > Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch > > > Our users often complain about the slowness of smaller ad-hoc jobs. > The overhead to wait for the reducers to start in this case is significant. > It will be good if we can start the reducer sooner in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.