[ https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857738#comment-15857738 ]
KaiXu commented on HIVE-15671: ------------------------------ I may encounter this situation you mentioned. I run a query, Hive on Spark, failed with error: 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1 Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask the driver was indeed failed with some unknown reason: 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in stage 3.0 (TID 2519) 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected. 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down. 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in stage 3.0 (TID 2511) 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in stage 3.0 (TID 2515) 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 3.0 (TID 2417) 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1225.0 in stage 3.0 (TID 2526) 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 905.0 in stage 3.0 (TID 2423) in hive's log, 2017-02-08T09:51:04,327 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.TaskSetManager: Finished task 971.0 in stage 3.0 (TID 2218) in 5948 ms on hsx-node8 (1338/1520) 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(io.netty.channel.ChannelHandlerContext, org.apache.hive.spark.client.rpc.Rpc$MessageHeader)). 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.RpcDispatcher: [DriverProtocol] Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead. 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (null). 2017-02-08T09:51:04,347 INFO [RPC-Handler-3] rpc.RpcDispatcher: [ClientProtocol] Closing channel due to exception in pipeline (Connection reset by peer). 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed. 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63) 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.Rpc: Failed to send RPC, closing connection. 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: java.nio.channels.ClosedChannelException 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN client.RemoteDriver: Shutting down driver because RPC channel was closed. 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO client.RemoteDriver: Shutting down remote driver. 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed. 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149) 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259) 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270) 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490) 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63) 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.DAGScheduler: Asked to cancel job 2 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.InterruptedException 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Native Method) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Object.java:502) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:120) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:108) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$.ready(package.scala:86) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:303) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Thread.run(Thread.java:745) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed. 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259) 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) 2017-02-08T09:51:04,654 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-65f40590-d87f-4701-b374-6b3b2a11538c 2017-02-08T09:52:04,346 WARN [b723c85d-2a7b-469e-bab1-9c165b25e656 main] impl.RemoteSparkJobStatus: Error getting stage info java.util.concurrent.TimeoutException at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageInfo(RemoteSparkJobStatus.java:161) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:96) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:82) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1997) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1688) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1419) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60] at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60] at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2] at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2] 2017-02-08T09:52:04,346 ERROR [b723c85d-2a7b-469e-bab1-9c165b25e656 main] status.SparkJobMonitor: Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)' java.lang.IllegalStateException: RPC channel is closed. at com.google.common.base.Preconditions.checkState(Preconditions.java:149) ~[guava-14.0.1.jar:?] at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] also in container's log, I find Driver still request for executors: 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 77 executor(s). 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 76 executor(s). 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 75 executor(s). 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 74 executor(s). 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 73 executor(s). 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 71 executor(s). 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 70 executor(s). 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 50 executor(s). 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none. 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s). 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none. 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 192.168.1.1:42777 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. hsx-node1:42777 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED 17/02/08 09:51:04 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1486453422616_0150 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called So, is this situation the client to server timeout? I only set hive.spark.job.monitor.timeout=3600s; > RPCServer.registerClient() erroneously uses server/client handshake timeout > for connection timeout > -------------------------------------------------------------------------------------------------- > > Key: HIVE-15671 > URL: https://issues.apache.org/jira/browse/HIVE-15671 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 1.1.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Attachments: HIVE-15671.1.patch, HIVE-15671.patch > > > {code} > /** > * Tells the RPC server to expect a connection from a new client. > * ... > */ > public Future<Rpc> registerClient(final String clientId, String secret, > RpcDispatcher serverDispatcher) { > return registerClient(clientId, secret, serverDispatcher, > config.getServerConnectTimeoutMs()); > } > {code} > {{config.getServerConnectTimeoutMs()}} returns value for > *hive.spark.client.server.connect.timeout*, which is meant for timeout for > handshake between Hive client and remote Spark driver. Instead, the timeout > should be *hive.spark.client.connect.timeout*, which is for timeout for > remote Spark driver in connecting back to Hive client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)