[
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680580#comment-14680580
]
Ben Podgursky commented on MAPREDUCE-5817:
------------------------------------------
Is this ticket on any roadmaps? We're running into a problem which I think is
an extreme of this case -- we have a lot of MR jobs which are Map-only, and
when a NodeManager goes unhealthy while other map tasks are finishing, the Map
task is re-run, even though there are no reducers at all. If the tasks are
slow, this is a huge waste of time.
> mappers get rescheduled on node transition even after all reducers are
> completed
> --------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster
> Affects Versions: 2.3.0
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Labels: BB2015-05-TBR
> Attachments: mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already
> finished. We found that the job was rescheduling and running a number of
> mappers beyond the point of reducer completion. In one situation, the job ran
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes
> into the app master, it just reschedules all mappers that already ran on the
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period.
> Once this window opens, another node transition can prolong it, and this can
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration,
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not
> reschedule mapper tasks. If all reducers are completed, the mapper outputs
> are no longer needed, and there is no need to reschedule mapper tasks as they
> would not be consumed anyway.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)