[
https://issues.apache.org/jira/browse/MAPREDUCE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy updated MAPREDUCE-2952:
-------------------------------------
Attachment: MAPREDUCE-2952.patch
Updated patch.
AM Container crash at launch:
{noformat}
11/09/24 06:31:15 INFO mapreduce.Job: Running job: job_1316845853225_0001
11/09/24 06:31:16 INFO mapreduce.Job: map 0% reduce 0%
11/09/24 06:31:17 INFO mapreduce.Job: Job job_1316845853225_0001 failed with
state FAILED due to: Application application_1316845853225_0001 failed 1 times
due to AM Container for appattempt_1316845853225_0001_000001 exited with
exitCode: -1000 due to: java.io.IOException: App initialization failed (24)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:143)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:799)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:261)
at org.apache.hadoop.util.Shell.run(Shell.java:188)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:135)
... 1 more
.Failing this attempt.. Failing the application.
{noformat}
AM Container crash at runtime:
{noformat}
11/09/24 06:17:00 INFO mapreduce.Job: Job job_1316844992085_0001 failed with
state FAILED due to: Application application_1316844992085_0001 failed 1 times
due to AM Container for appattempt_1316844992085_0001_000001 exited with
exitCode: 137 due to: Container
[pid=29154,containerID=container_1316844992085_0001_01_000001] is running
beyond memory-limits. Current usage : 3432910848bytes. Limit : 2147483648bytes.
Killing container.
Dump of the process-tree for container_1316844992085_0001_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 29165 29154 29154 29154 (java) 507 18 3367518208 28101
/grid/0/jdk/bin/java -Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/grid/0/dev/hrt_mr/hadoop/logs/application_1316844992085_0001/container_1316844992085_0001_01_000001
-Dyarn.app.mapreduce.container.log.filesize=0 -Xmx3000m
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
|- 29154 28859 29154 29154 (bash) 0 0 65392640 280 /bin/bash -c
/grid/0/jdk/bin/java -Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/grid/0/dev/hrt_mr/hadoop/logs/application_1316844992085_0001/container_1316844992085_0001_01_000001
-Dyarn.app.mapreduce.container.log.filesize=0 -Xmx3000m
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/grid/0/dev/hrt_mr/hadoop/logs/application_1316844992085_0001/container_1316844992085_0001_01_000001/stdout
2>/grid/0/dev/hrt_mr/hadoop/logs/application_1316844992085_0001/container_1316844992085_0001_01_000001/stderr
Container killed on request. Exit code is 137
{noformat}
> Application failure diagnostics are not consumed in a couple of cases
> ---------------------------------------------------------------------
>
> Key: MAPREDUCE-2952
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2952
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, resourcemanager
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Arun C Murthy
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2952.patch, MAPREDUCE-2952.patch,
> MAPREDUCE-2952.patch, MAPREDUCE-2952.patch
>
>
> When Container crashes, the reason for failures isn't propagated because of a
> bug in _RMAppAttemptImpl.AMContainerCrashedTransition_ which simply discards
> the diagnostics of the container. Also RMAppAttemptImpl.diagnostics is never
> consumed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira