[ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409502#comment-15409502
 ] 

Laurent Hoss commented on SPARK-16522:
--------------------------------------

Is this gonna be fixed soon (2.0.1) ?!
I just got a stacktrace of a developer here, who's job *failed* .. with 
following exception
{code}
2016-08-05 05:43:35,186 WARN  org.apache.spark.rpc.netty.NettyRpcEndpointRef: 
Error sending message [message = RemoveExecutor(5,Executor finished with state 
FINISHED)] in 1 attempts
org.apache.spark.SparkException: Exception thrown in awaitResult
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
        at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Could not find 
CoarseGrainedScheduler.
        at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
        at 
org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
        at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
        at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
        ... 4 more
{code}


> [MESOS] Spark application throws exception on exit
> --------------------------------------------------
>
>                 Key: SPARK-16522
>                 URL: https://issues.apache.org/jira/browse/SPARK-16522
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 2.0.0
>            Reporter: Sun Rui
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>       at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>       at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>       at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>       at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>       at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>       at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>       at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>       at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>       at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>       at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>       at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>       at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>       at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>       at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>       at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>       at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>       at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>       ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>       at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>       at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>       at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>       at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>       at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>       at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>       ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>       at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>       at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>       at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>       at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>       at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>       at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>       at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>       at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>       ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>       at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>       at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>       at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>       at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>       at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>       ... 4 more
> {noformat}
> Applications' result is not affected by this error.
> This issue can be simply reproduced by launching a spark-shell, and exit 
> after running the following commands:
> {code}
> val rdd = sc.parallelize(1 to 10, 10)
> rdd.map { _ + 1} collect
> {code}
> The root cause is that in SparkContext.stop(), 
> MesosCoarseGrainedSchedulerBackend.stop() calls 
> CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop 
> executors and also stop the driver endpoint without waiting for the actual 
> stop of executors. MesosCoarseGrainedSchedulerBackend.stop() still waits for 
> the executors to stop in a timeout. During the wait, 
> MesosCoarseGrainedSchedulerBackend.statusUpdate() generally will be called to 
> update executors' status, and in turn removeExecutor() is called. But at that 
> time, the driver endpoint is not available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to