[ https://issues.apache.org/jira/browse/MAPREDUCE-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994678#comment-14994678 ]
Wangda Tan commented on MAPREDUCE-6541: --------------------------------------- [~Naganarasimha], that's also different: MAPREDUCE-6514 is to fix bug in existing reducer preemption logic (remove reducer request locally but not notify RM). This is an enhancement of how to calculate available mapper slots. > Exclude pending reducer memory when calculating available mapper slots from > headroom to avoid deadlock > ------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-6541 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6541 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Wangda Tan > > We saw a MR deadlock recently: > - When NM restarted by framework without enable recovery, containers running > on these nodes will be identified as "ABORTED", and MR AM will try to > reschedule "ABORTED" mapper containers. > - Since such lost mappers are "ABORTED" container, MR AM gives normal mapper > priority (priority=20) to such mapper requests. If there's any pending > reducer (priority=10) at the same time, mapper requests need to wait for > reducer requests satisfied. > - In our test, one mapper needs 700+ MB, reducer needs 1000+ MB, and RM > available resource = mapper-request = (700+ MB), only one job was running in > the system so scheduler cannot allocate more reducer containers AND MR-AM > thinks there're enough headroom for mapper so reducer containers will not be > preempted. > MAPREDUCE-6302 can solve most of the problems, but in the other hand, I think > we may need to exclude pending reducers resource when calculating > #available-mapper-slots from headroom. Which we can avoid excessive reducer > preemption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)