[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123493#comment-16123493
 ] 

Erik Krogen commented on MAPREDUCE-6870:
----------------------------------------

[~haibo.chen], [~pbacsko], thank you for working on this! To provide some 
context, the reason we wanted it to be configurable is in case mapper tasks 
have side effects which are expected to be executed in full. For example, you 
may have a map task which deletes an output directory as it starts, then 
populates that directory. With this patch in effect, you could potentially wipe 
the output of a previous map tasks's execution and then never fully repopulate 
it (since the mapper is preempted). It's a pretty niche case but who knows what 
MR behavior people might be relying on.

Given that this patch is enabling the new behavior by default, should this be 
marked as an incompatible change? Ping [~templedf], [~andrew.wang], who I know 
are working on compatibility guidelines.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6870
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.6.1
>            Reporter: Zhe Zhang
>            Assignee: Peter Bacsko
>             Fix For: 3.0.0-beta1
>
>         Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to