[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118661#comment-16118661
 ] 

Haibo Chen commented on MAPREDUCE-6870:
---------------------------------------

A few knits:
1) Use block comment /* **/ for checkReadyForCompletionWhenAllReducersDone()
2) We can avoid iterating over all map tasks if job. completingJob is true, 
that is,
{code}
if (totalReduces > 0 && totalReduces == completedReduces) {
    if (!job.completingJob) {
        for(task: mapTasks) {
            kill if task is not finished.
        }
        job.completingJob = true;
    }
}
{code}

3) Can we remove "assertJobState(job, JobStateInternal.RUNNING)" in 
TestJobImpl.testRunningMapperPreemptionWhenReducerIsFinished() since it is not 
doing anything, and add a comment before the "if(killMappers)" statement saying 
that the stubbed job cannot finish and we therefore verify task kill events 
instead?

4) The description of mapreduce.job.finish-when-all-reducers-done in 
mapred-default.xml stills says terminate running map tasks. I think we should 
say something like 
'Specifies whether the job should complete once all reducers have finished, 
regardless of whether there are still running mappers', which is closer to what 
really matters to end users. Related to this, we can rename 
testRunningMappersPreemptedWhenReducerIsFinished and 
testRunningMappersNotPreemptedWhenReducerIsFinished to something like 
'testJobCompletedWhenAllReducersAreFinished' , 
'testJobNotCompletedWhenAllReducersAreFinished'.


> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6870
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.6.1
>            Reporter: Zhe Zhang
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to