[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935060#comment-13935060
 ] 

Jason Lowe commented on MAPREDUCE-5792:
---------------------------------------

bq. So maybe the better fix here is to get the RM to pull the logs off of HDFS 
instead of linking to the NM?

The problem with this approach is that the RM may have difficulty knowing when 
log aggregation has completed to know whether it should continue referencing 
the NM or redirect to the log server.

bq.  I'm not sure who's supposed to be handling log viewing besides the JHS 
which is specific to M/R jobs.

The JHS can serve logs even for non-MR jobs.  It was a hack to provide an 
aggregated log server before one existed.  Now in recent 2.x I believe the YARN 
Application History/Timeline Server can serve up logs as well.  On our 0.23 
clusters we are using the JHS to serve up aggregated logs, and 
yarn.log.server.url is configured to 
{noformat}http://jhs-server-name:port/jobhistory/nmlogs{noformat}

> When mapreduce.jobhistory.intermediate-done-dir isn't writable, application 
> fails with generic error
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5792
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5792
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 2.3.0
>            Reporter: Travis Thompson
>            Assignee: Mohammad Kamrul Islam
>
> When trying to run an application and the permissions are wrong on 
> {{mapreduce.jobhistory.intermediate-done-dir}}, the MapReduce AM fails with a 
> non-descriptive error message:
> {noformat}
> Application application_1394227890066_0004 failed 2 times due to AM Container 
> for appattempt_1394227890066_0004_000002 exited with exitCode: 1 due to: 
> Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:279)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> main : command provided 1
> main : user is tthompso
> main : requested yarn user is tthompso
> Container exited with a non-zero exit code 1
> .Failing this attempt.. Failing the application. 
> {noformat}
> When permissions are corrected on this dir, applications are able to run.  
> There should probably be some sort of check on this dir before launching the 
> AM so a more meaningful error message can be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to