Sun Rui created SPARK-16522:
-------------------------------
Summary: [MESOS] Spark application throws exception on exit
Key: SPARK-16522
URL: https://issues.apache.org/jira/browse/SPARK-16522
Project: Spark
Issue Type: Bug
Components: Mesos
Affects Versions: 2.0.0
Reporter: Sun Rui
Spark applications running on Mesos throw exception upon exit as follows:
{panel}
16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message =
RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
org.apache.spark.SparkException: Exception thrown in awaitResult
at
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
at
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
at
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Could not find
CoarseGrainedScheduler.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
at
org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
... 4 more
Exception in thread "Thread-47" org.apache.spark.SparkException: Error
notifying standalone scheduler's driver endpoint
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
at
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
at
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Error sending message [message =
RemoveExecutor(1,Executor finished with state FINISHED)]
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
... 2 more
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
at
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
... 4 more
Caused by: org.apache.spark.SparkException: Could not find
CoarseGrainedScheduler.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
at
org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
... 4 more
{panel}
Applications' result is not affected by this error.
This issue can be simply reproduced by launching a spark-shell, and exit after
running the following commands:
{code}
val rdd = sc.parallelize(1 to 10, 10)
rdd.map { _ + 1} collect
{code}
The root cause is that in SparkContext.stop(),
MesosCoarseGrainedSchedulerBackend.stop() calls
CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop
executors and also stop the driver endpoint without waiting for the actual stop
of executors. MesosCoarseGrainedSchedulerBackend.stop() still waits for the
executors to stop in a timeout. During the wait,
MesosCoarseGrainedSchedulerBackend.statusUpdate() generally will be called to
update executors' status, and in turn removeExecutor() is called. But at that
time, the driver endpoint is not available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]