[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-2952:
-------------------------------------

    Attachment: MAPREDUCE-2952.patch

Updated patch.

AM Container crash at launch:

{noformat}
11/09/24 06:31:15 INFO mapreduce.Job: Running job: job_1316845853225_0001
11/09/24 06:31:16 INFO mapreduce.Job:  map 0% reduce 0%
11/09/24 06:31:17 INFO mapreduce.Job: Job job_1316845853225_0001 failed with 
state FAILED due to: Application application_1316845853225_0001 failed 1 times 
due to AM Container for appattempt_1316845853225_0001_000001 exited with  
exitCode: -1000 due to: java.io.IOException: App initialization failed (24)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:143)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:799)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:261)
        at org.apache.hadoop.util.Shell.run(Shell.java:188)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:135)
        ... 1 more

.Failing this attempt.. Failing the application.
{noformat}

AM Container crash at runtime:

{noformat}
11/09/24 06:17:00 INFO mapreduce.Job: Job job_1316844992085_0001 failed with 
state FAILED due to: Application application_1316844992085_0001 failed 1 times 
due to AM Container for appattempt_1316844992085_0001_000001 exited with  
exitCode: 137 due to: Container 
[pid=29154,containerID=container_1316844992085_0001_01_000001] is running 
beyond memory-limits. Current usage : 3432910848bytes. Limit : 2147483648bytes. 
Killing container. 
Dump of the process-tree for container_1316844992085_0001_01_000001 : 
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 29165 29154 29154 29154 (java) 507 18 3367518208 28101 
/grid/0/jdk/bin/java -Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.mapreduce.container.log.dir=/grid/0/dev/hrt_mr/hadoop/logs/application_1316844992085_0001/container_1316844992085_0001_01_000001
 -Dyarn.app.mapreduce.container.log.filesize=0 -Xmx3000m 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
        |- 29154 28859 29154 29154 (bash) 0 0 65392640 280 /bin/bash -c 
/grid/0/jdk/bin/java -Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.mapreduce.container.log.dir=/grid/0/dev/hrt_mr/hadoop/logs/application_1316844992085_0001/container_1316844992085_0001_01_000001
 -Dyarn.app.mapreduce.container.log.filesize=0 -Xmx3000m 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
1>/grid/0/dev/hrt_mr/hadoop/logs/application_1316844992085_0001/container_1316844992085_0001_01_000001/stdout
 
2>/grid/0/dev/hrt_mr/hadoop/logs/application_1316844992085_0001/container_1316844992085_0001_01_000001/stderr
   

Container killed on request. Exit code is 137
{noformat}


> Application failure diagnostics are not consumed in a couple of cases
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2952
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2952
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2952.patch, MAPREDUCE-2952.patch, 
> MAPREDUCE-2952.patch, MAPREDUCE-2952.patch
>
>
> When Container crashes, the reason for failures isn't propagated because of a 
> bug in _RMAppAttemptImpl.AMContainerCrashedTransition_ which simply discards 
> the diagnostics of the container. Also RMAppAttemptImpl.diagnostics is never 
> consumed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to