[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

Varun Saxena (JIRA) Thu, 29 Oct 2015 13:06:03 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Varun Saxena updated MAPREDUCE-6514:
------------------------------------
    Description: 
In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
reduces map and put these reducers to pending. This is not updated in ask. So 
RM keeps on assigning and AM is not able to assign as no reducer is 
scheduled(check logs below the code).
If this is updated immediately, RM will be able to schedule mappers immediately 
which anyways is the intention when we ramp down reducers.
Scheduler need not allocate for ramped down reducers
This if not handled can lead to map starvation as pointed out in MAPREDUCE-6513
{code}
 LOG.info("Ramping down all scheduled reduces:"
            + scheduledRequests.reduces.size());
        for (ContainerRequest req : scheduledRequests.reduces.values()) {
          pendingReduces.add(req);
        }
        scheduledRequests.reduces.clear();
{code}
{noformat}
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
assigned : container_1437451211867_1485_01_000215
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
container Container: [ContainerId: container_1437451211867_1485_01_000216, 
NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: hdszzdcxdat6g06u04p:26010, 
Resource: <memory:4096, vCores:1>, Priority: 10, Token: Token { kind: 
ContainerToken, service: 10.2.33.236:26009 }, ] for a reduce as either  
container memory less than required 4096 or no pending reduce tasks - 
reduces.isEmpty=true
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
assigned : container_1437451211867_1485_01_000216
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
container Container: [ContainerId: container_1437451211867_1485_01_000217, 
NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: hdszzdcxdat6g06u06p:26010, 
Resource: <memory:4096, vCores:1>, Priority: 10, Token: Token { kind: 
ContainerToken, service: 10.2.33.239:26009 }, ] for a reduce as either  
container memory less than required 4096 or no pending reduce tasks - 
reduces.isEmpty=true
{noformat}

  was:
In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
reduces map and put these reducers to pending. This is not updated in ask. So 
RM keeps on assigning and AM is not able to assign as no reducer is 
scheduled(check logs below the code).
If this is updated immediately, RM will be able to schedule mappers immediately 
which anyways is the intention when we ramp down reducers.
This if not handled can lead to map starvation as pointed out in MAPREDUCE-6513
{code}
 LOG.info("Ramping down all scheduled reduces:"
            + scheduledRequests.reduces.size());
        for (ContainerRequest req : scheduledRequests.reduces.values()) {
          pendingReduces.add(req);
        }
        scheduledRequests.reduces.clear();
{code}
{noformat}
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
assigned : container_1437451211867_1485_01_000215
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
container Container: [ContainerId: container_1437451211867_1485_01_000216, 
NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: hdszzdcxdat6g06u04p:26010, 
Resource: <memory:4096, vCores:1>, Priority: 10, Token: Token { kind: 
ContainerToken, service: 10.2.33.236:26009 }, ] for a reduce as either  
container memory less than required 4096 or no pending reduce tasks - 
reduces.isEmpty=true
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
assigned : container_1437451211867_1485_01_000216
2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
container Container: [ContainerId: container_1437451211867_1485_01_000217, 
NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: hdszzdcxdat6g06u06p:26010, 
Resource: <memory:4096, vCores:1>, Priority: 10, Token: Token { kind: 
ContainerToken, service: 10.2.33.239:26009 }, ] for a reduce as either  
container memory less than required 4096 or no pending reduce tasks - 
reduces.isEmpty=true
{noformat}


> Job hangs as ask is not updated after ramping down of all reducers
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6514
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.7.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>            Priority: Critical
>         Attachments: MAPREDUCE-6514.01.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
>             + scheduledRequests.reduces.size());
>         for (ContainerRequest req : scheduledRequests.reduces.values()) {
>           pendingReduces.add(req);
>         }
>         scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: <memory:4096, vCores:1>, Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: <memory:4096, vCores:1>, Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

Reply via email to