[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111487#comment-16111487
 ] 

Haibo Chen commented on MAPREDUCE-6870:
---------------------------------------

bq. 1. Variable names (is preemptMappersOnReduceFinish good?)
I thingk we can rename JobImplpreemptMappersOnReduceFinish,  
MRJobConfig.PREEMPT_MAPPERS_ON_REDUCE_FINISH and 
MRJobConfig.DEFAULT_PREEMPT_MAPPERS_ON_REDUCE_FINISH to convey more clearly 
that our intention is to finish the job from the users' perspective (because 
the configuration is user-facing). Also the description of 
mapreduce.job.finish-when-all-reducers-done in mapred-default.xml.

bq. 2. Added a new method to MapTaskImpl with locking, which is probably not 
necessary but I felt it's better to have it anyway
What I meant previously is that we are creating duplicated TA_KILL events in 
JobImpl.checkReadyForCompletionWhenAllReducersDone(). I am not sure adding 
MapTaskImp.preemptedDueToReducerFinish is going to help in this case, because 
we will still be sending duplicated TA_KILL events. I was proposing something 
like the following
{code}
    private void checkReadyForCompletionWhenAllReducersDone(JobImpl job) {
      if (job.preemptRestartedMappersOnReduceFinish) {
        int totalReduces = job.getTotalReduces();
        int completedReduces = job.getCompletedReduces();
        // only if all reducers have finished and we have not sent TA_KILL 
events to running
        // map tasks before
        if (totalReduces > 0 && totalReduces == completedReduces && 
!inCompletingJob) {
          for (TaskId mapTaskId : job.mapTasks) {
            MapTaskImpl task = (MapTaskImpl) job.tasks.get(mapTaskId);
            if (!task.isFinished() && !task.isPreemptedDueToReducerFinish()) {
              LOG.info("Killing map task " + task.getID());
              task.setPreemptedDueToReducerFinish(true);
              job.eventHandler.handle(
                  new TaskEvent(task.getID(), TaskEventType.T_KILL));
            }
          }
          inCompletingJob = true;
        }
      }
{code}

In TestJobImpl, can you explain to me what this code is doing
{code}
    // finish mappers - if the mapper is preempted, its state will be
    // KILLED, set by KillWaitAttemptKilledTransition
    for (TaskId taskId: job.tasks.keySet()) {
      TaskState state = killMappers ? TaskState.KILLED : TaskState.SUCCEEDED;
      if (taskId.getTaskType() == TaskType.MAP) {
        job.handle(new JobTaskEvent(taskId, state));
      }
    }
{code}
Also, do we expect the job the succeed even when killMappers is set to false? 
I'd expect if killMappers is false,
The mappers will stay in RUNNING state, and thus hanging the job. In other 
words, the job should stay in
RUNNING state (after we drain events in the dispatcher event queue). No?


> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6870
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.6.1
>            Reporter: Zhe Zhang
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to