[ 
https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15859:
--------------------------
    Affects Version/s:     (was: 2.2.0)
                       2.1.1

> HoS: Write RPC messages in event loop
> -------------------------------------
>
>                 Key: HIVE-15859
>                 URL: https://issues.apache.org/jira/browse/HIVE-15859
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Spark
>    Affects Versions: 2.1.1
>         Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>            Reporter: KaiXu
>            Assignee: Rui Li
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15859.1.patch, HIVE-15859.2.patch, 
> HIVE-15859.3.patch
>
>
> Hive on Spark, failed with error:
> {noformat}
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 
> 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC 
> channel is closed.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> application log shows the driver commanded a shutdown with some unknown 
> reason, but hive's log shows Driver could not get RPC header( Expected RPC 
> header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> {noformat}
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in 
> stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver 
> commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
> (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
> 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in 
> stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remoting shut down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in 
> stage 3.0 (TID 2515)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 
> 3.0 (TID 2417)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1225.0 in 
> stage 3.0 (TID 2526)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 905.0 in stage 
> 3.0 (TID 2423)
> {noformat}
> in hive's log,
> {noformat}
> 2017-02-08T09:51:04,327 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 INFO scheduler.TaskSetManager: Finished task 971.0 in stage 
> 3.0 (TID 2218) in 5948 ms on hsx-node8 (1338/1520)
> 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel 
> due to exception in pipeline 
> (org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(io.netty.channel.ChannelHandlerContext,
>  org.apache.hive.spark.client.rpc.Rpc$MessageHeader)).
> 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 WARN rpc.RpcDispatcher: [DriverProtocol] Expected RPC 
> header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead.
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel 
> due to exception in pipeline (null).
> 2017-02-08T09:51:04,347 INFO [RPC-Handler-3] rpc.RpcDispatcher: 
> [ClientProtocol] Closing channel due to exception in pipeline (Connection 
> reset by peer).
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener 
> threw an exception
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 
> java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 WARN rpc.Rpc: Failed to send RPC, closing connection.
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 
> java.nio.channels.ClosedChannelException
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 WARN client.RemoteDriver: Shutting down driver because RPC 
> channel was closed.
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 INFO client.RemoteDriver: Shutting down remote driver.
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener 
> threw an exception
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: 
> java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 INFO scheduler.DAGScheduler: Asked to cancel job 2
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener 
> threw an exception
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 
> java.lang.InterruptedException
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> java.lang.Object.wait(Native Method)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> java.lang.Object.wait(Object.java:502)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:120)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:108)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> scala.concurrent.Await$.ready(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:303)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> java.lang.Thread.run(Thread.java:745)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener 
> threw an exception
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 
> java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,654 INFO [stderr-redir-1] client.SparkClientImpl: 
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /tmp/spark-65f40590-d87f-4701-b374-6b3b2a11538c
> 2017-02-08T09:52:04,346 WARN [b723c85d-2a7b-469e-bab1-9c165b25e656 main] 
> impl.RemoteSparkJobStatus: Error getting stage info
> java.util.concurrent.TimeoutException
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageInfo(RemoteSparkJobStatus.java:161)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:96)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:82)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1997) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1688) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1419) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) 
> ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_60]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_60]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
> ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
> ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
> 2017-02-08T09:52:04,346 ERROR [b723c85d-2a7b-469e-bab1-9c165b25e656 main] 
> status.SparkJobMonitor: Failed to monitor Job[ 2] with exception 
> 'java.lang.IllegalStateException(RPC channel is closed.)'
> java.lang.IllegalStateException: RPC channel is closed.
> at com.google.common.base.Preconditions.checkState(Preconditions.java:149) 
> ~[guava-14.0.1.jar:?]
> at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}
> also in container's log, I find Driver still request for executors:
> {noformat}
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 
> 77 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor 
> containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 
> 76 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor 
> containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 
> 75 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor 
> containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 
> 74 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor 
> containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 
> 73 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor 
> containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 
> 71 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor 
> containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 
> 70 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor 
> containers
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 
> 50 executor(s).
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor 
> containers
> 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, 
> but found none.
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 
> 0 executor(s).
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor 
> containers
> 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, 
> but found none.
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated 
> or disconnected! Shutting down. 192.168.1.1:42777
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated 
> or disconnected! Shutting down. hsx-node1:42777
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, 
> exitCode: 0
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with SUCCEEDED
> 17/02/08 09:51:04 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Deleting staging directory 
> .sparkStaging/application_1486453422616_0150
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> {noformat}
> found only one ERROR in yarn application log:
> {noformat}
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1492.0 in stage 3.0 
> (TID 2168). 3294 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 556.0 in stage 3.0 
> (TID 1587). 3312 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1412.0 in stage 3.0 
> (TID 2136). 3294 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1236.0 in stage 3.0 
> (TID 2007). 3294 bytes result sent to driver
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver 
> commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
> (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
> 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a8167f0b-f3c3-458f-ad51-8a0f4bcda4f3
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-26cba445-66d2-4b78-a428-17881c92f0f6
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remoting shut down.
> {noformat}
> this error occurs when several queries run at the same time with large data 
> scale, in fact it would not occur when running the query separately, but it 
> can frequently occur when running together again.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to