[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

Vinod Kumar Vavilapalli (JIRA) Fri, 30 Oct 2015 16:43:14 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vinod Kumar Vavilapalli updated MAPREDUCE-6514:
-----------------------------------------------
    Status: Open  (was: Patch Available)

h4. Comment on current patch
You should look at {{rampDownReduces()}} API and use it instead of hand-rolling 
{{decContainerReq}}. I actually think once we do this, you should remove 
{{clearAllPendingReduceRequests()}} altogether.

I am looking at branch-2 and I think the current patch is better served on top 
of MAPREDUCE-6302 (and this only in 2.8+) given the numerous changes made 
there. The patch obviously doesn't apply on branch-2.7 which you set the 
target-version as (2.7.2). Canceling the patch.

h4. Meta thought
If MAPREDUCE-6513 goes through per my latest proposal there, there is no need 
for canceling all the reduce asks and thus this patch, no? 

h4. Release
IAC, this has been a long-standing problem (though I'm very surprised nobody 
caught this till now), so I'd propose we move this out into 2.7.3 or better 
2.8+ so I can make progress on the 2.7.2 release. Thoughts?

> Job hangs as ask is not updated after ramping down of all reducers
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6514
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.7.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>            Priority: Critical
>         Attachments: MAPREDUCE-6514.01.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
>             + scheduledRequests.reduces.size());
>         for (ContainerRequest req : scheduledRequests.reduces.values()) {
>           pendingReduces.add(req);
>         }
>         scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: <memory:4096, vCores:1>, Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: <memory:4096, vCores:1>, Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

Reply via email to