[
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
KaiXu updated SPARK-19569:
--------------------------
Summary: could not get APP ID and failed to connect to spark driver on
yarn-client mode (was: could not connect to spark driver on yarn-client mode)
> could not get APP ID and failed to connect to spark driver on yarn-client
> mode
> -------------------------------------------------------------------------------
>
> Key: SPARK-19569
> URL: https://issues.apache.org/jira/browse/SPARK-19569
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.0.2
> Environment: hadoop2.7.1
> spark2.0.2
> hive2.2
> Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check
> the container's log, found it failed to connected to spark driver. I have set
> hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been
> submitted after 3601s', actually during this long-time period it's impossible
> no available resource, and also did not see any issue related to the network,
> so the cause is not clear from the message "Possible reasons include network
> issues, errors in remote driver or the cluster has no available resources,
> etc.".
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020
> file:
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
> } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility:
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port:
> 8020 file:
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip"
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId:
> appattempt_1486905599813_0046_000002
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to:
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to:
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(root); groups
> with view permissions: Set(); users with modify permissions: Set(root);
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Uncaught exception:
> org.apache.spark.SparkException: Failed to connect to driver!
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:569)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:405)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:749)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:747)
> at
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:774)
> at
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Final app status: FAILED,
> exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException:
> Failed to connect to driver!)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Unregistering
> ApplicationMaster with FAILED (diag message: Uncaught exception:
> org.apache.spark.SparkException: Failed to connect to driver!)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Deleting staging directory
> hdfs://hsx-node1:8020/user/root/.sparkStaging/application_1486905599813_0046
> 17/02/13 05:07:34 INFO util.ShutdownHookManager: Shutdown hook called
> hive's log:
> 2017-02-13T03:10:01,639 INFO [stderr-redir-1] client.SparkClientImpl:
> 17/02/13 03:10:01 INFO yarn.Client: Application report for
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T03:10:06,640 INFO [stderr-redir-1] client.SparkClientImpl:
> 17/02/13 03:10:06 INFO yarn.Client: Application report for
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T03:10:08,176 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main]
> impl.RemoteSparkJobStatus: Failed to get APP ID.
> java.util.concurrent.TimeoutException
> at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
> ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
> at
> org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:114)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_60]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_60]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_60]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> ~[hadoop-common-2.7.1.jar:?]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> ~[hadoop-common-2.7.1.jar:?]
> 2017-02-13T03:10:11,641 INFO [stderr-redir-1] client.SparkClientImpl:
> 17/02/13 03:10:11 INFO yarn.Client: Application report for
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T03:10:16,643 INFO [stderr-redir-1] client.SparkClientImpl:
> 17/02/13 03:10:16 INFO yarn.Client: Application report for
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T04:11:07,354 INFO [stderr-redir-1] client.SparkClientImpl:
> 17/02/13 04:11:07 INFO yarn.Client: Application report for
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T04:11:09,706 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main]
> impl.RemoteSparkJobStatus: Failed to get APP ID.
> java.util.concurrent.TimeoutException
> at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
> ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
> at
> org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:132)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642)
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_60]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_60]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_60]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> ~[hadoop-common-2.7.1.jar:?]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> ~[hadoop-common-2.7.1.jar:?]
> 2017-02-13T04:11:09,719 ERROR [c807cf48-301a-47b4-96df-495b2827d6ba main]
> ql.Driver: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> 2017-02-13T04:11:09,720 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main]
> ql.Driver: Completed executing
> command(queryId=root_20170213030905_947e7eca-a134-4652-883c-10bded3c6218);
> Time taken: 3723.688 seconds
> 2017-02-13T04:11:09,800 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main]
> conf.HiveConf: Using the default value passed in for log id:
> c807cf48-301a-47b4-96df-495b2827d6ba
> 2017-02-13T04:11:09,800 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main]
> session.SessionState: Resetting thread name to main
> 2017-02-13T04:11:09,800 INFO [main] conf.HiveConf: Using the default value
> passed in for log id: c807cf48-301a-47b4-96df-495b2827d6ba
> 2017-02-13T04:11:12,355 INFO [stderr-redir-1] client.SparkClientImpl:
> 17/02/13 04:11:12 INFO yarn.Client: Application report for
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T04:11:17,356 INFO [stderr-redir-1] client.SparkClientImpl:
> 17/02/13 04:11:17 INFO yarn.Client: Application report for
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T04:11:19,811 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main]
> client.SparkClientImpl: Timed out shutting down remote driver, interrupting...
> 2017-02-13T04:11:19,811 WARN [Driver] client.SparkClientImpl: Waiting thread
> interrupted, killing child process.
> 2017-02-13T04:11:19,839 INFO [main] session.SessionState: Deleted directory:
> /tmp/hive/root/c807cf48-301a-47b4-96df-495b2827d6ba on fs with scheme hdfs
> 2017-02-13T04:11:19,839 INFO [main] session.SessionState: Deleted directory:
> /tmp/root/c807cf48-301a-47b4-96df-495b2827d6ba on fs with scheme file
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]