[ 
https://issues.apache.org/jira/browse/HIVE-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863645#comment-15863645
 ] 

Rui Li commented on HIVE-15887:
-------------------------------

Seems the application is ACCEPTED but didn't reach RUNNING. YARN log may be 
helpful to find out why the app hasn't launched.

> could not get APP ID and cause failed to connect to spark driver on 
> yarn-client mode
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-15887
>                 URL: https://issues.apache.org/jira/browse/HIVE-15887
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Spark
>    Affects Versions: 2.2.0
>         Environment: Hive2.2
> Spark2.0.2
> hadoop2.7.1
>            Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> From Hive's log, failed to get APP ID, so this might be the cause why the 
> driver did not start up.
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the 
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
> file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
>  } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: 
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 
> 8020 file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" 
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1486905599813_0046_000002
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(root); groups 
> with view permissions: Set(); users  with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Uncaught exception: 
> org.apache.spark.SparkException: Failed to connect to driver!
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:569)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:405)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:749)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:747)
>       at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:774)
>       at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: 
> Failed to connect to driver!)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with FAILED (diag message: Uncaught exception: 
> org.apache.spark.SparkException: Failed to connect to driver!)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Deleting staging directory 
> hdfs://hsx-node1:8020/user/root/.sparkStaging/application_1486905599813_0046
> 17/02/13 05:07:34 INFO util.ShutdownHookManager: Shutdown hook called
> hive's log:
> 2017-02-13T03:10:01,639  INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/13 03:10:01 INFO yarn.Client: Application report for 
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T03:10:06,640  INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/13 03:10:06 INFO yarn.Client: Application report for 
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T03:10:08,176  WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] 
> impl.RemoteSparkJobStatus: Failed to get APP ID.
> java.util.concurrent.TimeoutException
>         at 
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) 
> ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
>         at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:114) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_60]
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_60]
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_60]
>         at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
> ~[hadoop-common-2.7.1.jar:?]
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
> ~[hadoop-common-2.7.1.jar:?]
> 2017-02-13T03:10:11,641  INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/13 03:10:11 INFO yarn.Client: Application report for 
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T03:10:16,643  INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/13 03:10:16 INFO yarn.Client: Application report for 
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T04:11:07,354  INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/13 04:11:07 INFO yarn.Client: Application report for 
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T04:11:09,706  WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] 
> impl.RemoteSparkJobStatus: Failed to get APP ID.
> java.util.concurrent.TimeoutException
>         at 
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) 
> ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
>         at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:132) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_60]
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_60]
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_60]
>         at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
> ~[hadoop-common-2.7.1.jar:?]
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
> ~[hadoop-common-2.7.1.jar:?]
> 2017-02-13T04:11:09,719 ERROR [c807cf48-301a-47b4-96df-495b2827d6ba main] 
> ql.Driver: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> 2017-02-13T04:11:09,720  INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] 
> ql.Driver: Completed executing 
> command(queryId=root_20170213030905_947e7eca-a134-4652-883c-10bded3c6218); 
> Time taken: 3723.688 seconds
> 2017-02-13T04:11:09,800  INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] 
> conf.HiveConf: Using the default value passed in for log id: 
> c807cf48-301a-47b4-96df-495b2827d6ba
> 2017-02-13T04:11:09,800  INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] 
> session.SessionState: Resetting thread name to  main
> 2017-02-13T04:11:09,800  INFO [main] conf.HiveConf: Using the default value 
> passed in for log id: c807cf48-301a-47b4-96df-495b2827d6ba
> 2017-02-13T04:11:12,355  INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/13 04:11:12 INFO yarn.Client: Application report for 
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T04:11:17,356  INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/13 04:11:17 INFO yarn.Client: Application report for 
> application_1486905599813_0046 (state: ACCEPTED)
> 2017-02-13T04:11:19,811  WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] 
> client.SparkClientImpl: Timed out shutting down remote driver, interrupting...
> 2017-02-13T04:11:19,811  WARN [Driver] client.SparkClientImpl: Waiting thread 
> interrupted, killing child process.
> 2017-02-13T04:11:19,839  INFO [main] session.SessionState: Deleted directory: 
> /tmp/hive/root/c807cf48-301a-47b4-96df-495b2827d6ba on fs with scheme hdfs
> 2017-02-13T04:11:19,839  INFO [main] session.SessionState: Deleted directory: 
> /tmp/root/c807cf48-301a-47b4-96df-495b2827d6ba on fs with scheme file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to