[ https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kumar Vavilapalli updated MAPREDUCE-6514: ----------------------------------------------- Status: Open (was: Patch Available) h4. Comment on current patch You should look at {{rampDownReduces()}} API and use it instead of hand-rolling {{decContainerReq}}. I actually think once we do this, you should remove {{clearAllPendingReduceRequests()}} altogether. I am looking at branch-2 and I think the current patch is better served on top of MAPREDUCE-6302 (and this only in 2.8+) given the numerous changes made there. The patch obviously doesn't apply on branch-2.7 which you set the target-version as (2.7.2). Canceling the patch. h4. Meta thought If MAPREDUCE-6513 goes through per my latest proposal there, there is no need for canceling all the reduce asks and thus this patch, no? h4. Release IAC, this has been a long-standing problem (though I'm very surprised nobody caught this till now), so I'd propose we move this out into 2.7.3 or better 2.8+ so I can make progress on the 2.7.2 release. Thoughts? > Job hangs as ask is not updated after ramping down of all reducers > ------------------------------------------------------------------ > > Key: MAPREDUCE-6514 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster > Affects Versions: 2.7.1 > Reporter: Varun Saxena > Assignee: Varun Saxena > Priority: Critical > Attachments: MAPREDUCE-6514.01.patch > > > In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled > reduces map and put these reducers to pending. This is not updated in ask. So > RM keeps on assigning and AM is not able to assign as no reducer is > scheduled(check logs below the code). > If this is updated immediately, RM will be able to schedule mappers > immediately which anyways is the intention when we ramp down reducers. > Scheduler need not allocate for ramped down reducers > This if not handled can lead to map starvation as pointed out in > MAPREDUCE-6513 > {code} > LOG.info("Ramping down all scheduled reduces:" > + scheduledRequests.reduces.size()); > for (ContainerRequest req : scheduledRequests.reduces.values()) { > pendingReduces.add(req); > } > scheduledRequests.reduces.clear(); > {code} > {noformat} > 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not > assigned : container_1437451211867_1485_01_000215 > 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign > container Container: [ContainerId: container_1437451211867_1485_01_000216, > NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: > hdszzdcxdat6g06u04p:26010, Resource: <memory:4096, vCores:1>, Priority: 10, > Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a > reduce as either container memory less than required 4096 or no pending > reduce tasks - reduces.isEmpty=true > 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not > assigned : container_1437451211867_1485_01_000216 > 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign > container Container: [ContainerId: container_1437451211867_1485_01_000217, > NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: > hdszzdcxdat6g06u06p:26010, Resource: <memory:4096, vCores:1>, Priority: 10, > Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a > reduce as either container memory less than required 4096 or no pending > reduce tasks - reduces.isEmpty=true > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)