[
https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589925#comment-14589925
]
Hari Sekhon commented on MAPREDUCE-6401:
----------------------------------------
Actually the task logs showed the same thing, not much to go on:
{code}
Exception from container-launch. Container id:
container_e199_1434474871820_0001_02_000019 Exit code: 7 Stack trace:
ExitCodeException exitCode=7:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Shell output: main : command
provided 1 main : user is <custom_scrubbed> main : requested yarn user is
<custom_scrubbed> Container exited with a non-zero exit code 7
{code}
but the full tasks logs don't seem to have been retained by the history server.
This made me suspicious so I reset the logging locations to try to get my hands
on the full logs and after a yarn restart jobs started working normally again
without failed tasks/container launches. Although I'm very certain that the
cluster used to log to that dir I reset it to, perhaps Ambari had a bug that
lost the location and reset to debug locations that didn't work properly (it
wouldn't be the first time, eg. AMBARI-9022)
I think we should leave this as a minor todo to improve debugging information,
especially when launching shell commands and encountering non-zero exit codes,
logging is king.
> Container-launch failure gives no debugging output
> --------------------------------------------------
>
> Key: MAPREDUCE-6401
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 2.6.0
> Environment: HDP 2.2
> Reporter: Hari Sekhon
> Attachments: job.log
>
>
> MR jobs are failing on my cluster with Stack trace: ExitCodeException
> exitCode=7 but little else in terms of debugging information. Can we please
> improve the debugging info? Log file is attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)