Marcelo Vanzin created SPARK-24182:
--------------------------------------
Summary: Improve error message for client mode when AM fails
Key: SPARK-24182
URL: https://issues.apache.org/jira/browse/SPARK-24182
Project: Spark
Issue Type: Improvement
Components: YARN
Affects Versions: 2.3.0
Reporter: Marcelo Vanzin
Today, when the client AM fails, there's not a lot of useful information
printed on the output. Depending on the type of failure, the information
provided by the YARN AM is also not very useful. For example, you'd see this in
the Spark shell:
{noformat}
18/05/04 11:07:38 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might
have been killed or unable to launch application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:86)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
[long stack trace]
{noformat}
Similarly, on the YARN RM, for certain failures you see a generic error like
this:
{noformat}
ExitCodeException exitCode=10: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:543) at
org.apache.hadoop.util.Shell.run(Shell.java:460) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:366)
at
[blah blah blah]
{noformat}
It would be nice if we could provide a more accurate description of what went
wrong when possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]