[
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697606#comment-14697606
]
Hudson commented on MAPREDUCE-5817:
-----------------------------------
FAILURE: Integrated in Hadoop-trunk-Commit #8306 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/8306/])
MAPREDUCE-5817. Mappers get rescheduled on node transition even after all
reducers are completed. (Sangjin Lee via kasha) (kasha: rev
27d24f96ab8d17e839a1ef0d7076efc78d28724a)
*
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* hadoop-mapreduce-project/CHANGES.txt
*
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
> Mappers get rescheduled on node transition even after all reducers are
> completed
> --------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster
> Affects Versions: 2.3.0
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch,
> mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already
> finished. We found that the job was rescheduling and running a number of
> mappers beyond the point of reducer completion. In one situation, the job ran
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes
> into the app master, it just reschedules all mappers that already ran on the
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period.
> Once this window opens, another node transition can prolong it, and this can
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration,
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not
> reschedule mapper tasks. If all reducers are completed, the mapper outputs
> are no longer needed, and there is no need to reschedule mapper tasks as they
> would not be consumed anyway.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)