[
https://issues.apache.org/jira/browse/SPARK-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241951#comment-14241951
]
Ilayaperumal Gopinathan commented on SPARK-2892:
------------------------------------------------
To add more info:
When the ReceiverTracker sends the "StopReceiver" message to the receiver actor
at the executor, it's ReceiverLauncher thread always times out and I notice the
corresponding job is cancelled only because of stopping the DAGScheduler. This
throws the exception[1] while at the executor side the worker node throws this
info[2]
Exception[1]:
INFO sparkDriver-akka.actor.default-dispatcher-14
cluster.SparkDeploySchedulerBackend - Asking each executor to shut down
15:06:53,783 1.1.0.SNAP INFO Thread-40 scheduler.DAGScheduler - Job 1 failed:
start at SparkDriver.java:109, took 72.739141 s
Exception in thread "Thread-40" org.apache.spark.SparkException: Job cancelled
because SparkContext was shut down
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428)
at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundPostStop(DAGScheduler.scala:1375)
at
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
at
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
at akka.actor.ActorCell.terminate(ActorCell.scala:369)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
info[2]
INFO LocalActorRef: Message
[akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40127.0.0.1%3A52219-2#1619424601]
was not delivered. [1] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/12/10 15:06:53 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@localhost:51262] <-
[akka.tcp://sparkExecutor@localhost:52217]: Error [Shut down address:
akka.tcp://sparkExecutor@localhost:52217] [
akka.remote.ShutDownAssociation: Shut down address:
akka.tcp://sparkExecutor@localhost:52217
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The
remote system terminated the association because it is shutting down.
Please note that I am running everything on localhost and the same thing works
fine on "local" mode and the above issue only arises on "cluster" mode. I tried
changing the hostname to 127.0.0.1 but noticed the same.
Any clues on what might be going on here would help a lot.
Thanks!
> Socket Receiver does not stop when streaming context is stopped
> ---------------------------------------------------------------
>
> Key: SPARK-2892
> URL: https://issues.apache.org/jira/browse/SPARK-2892
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.0.2
> Reporter: Tathagata Das
> Assignee: Tathagata Das
> Priority: Critical
>
> Running NetworkWordCount with
> {quote}
> ssc.start(); Thread.sleep(10000); ssc.stop(stopSparkContext = false);
> Thread.sleep(60000)
> {quote}
> gives the following error
> {quote}
> 14/08/06 18:37:13 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0)
> in 10047 ms on localhost (1/1)
> 14/08/06 18:37:13 INFO DAGScheduler: Stage 0 (runJob at
> ReceiverTracker.scala:275) finished in 10.056 s
> 14/08/06 18:37:13 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
> have all completed, from pool
> 14/08/06 18:37:13 INFO SparkContext: Job finished: runJob at
> ReceiverTracker.scala:275, took 10.179263 s
> 14/08/06 18:37:13 INFO ReceiverTracker: All of the receivers have been
> terminated
> 14/08/06 18:37:13 WARN ReceiverTracker: All of the receivers have not
> deregistered, Map(0 ->
> ReceiverInfo(0,SocketReceiver-0,null,false,localhost,Stopped by driver,))
> 14/08/06 18:37:13 INFO ReceiverTracker: ReceiverTracker stopped
> 14/08/06 18:37:13 INFO JobGenerator: Stopping JobGenerator immediately
> 14/08/06 18:37:13 INFO RecurringTimer: Stopped timer for JobGenerator after
> time 1407375433000
> 14/08/06 18:37:13 INFO JobGenerator: Stopped JobGenerator
> 14/08/06 18:37:13 INFO JobScheduler: Stopped JobScheduler
> 14/08/06 18:37:13 INFO StreamingContext: StreamingContext stopped successfully
> 14/08/06 18:37:43 INFO SocketReceiver: Stopped receiving
> 14/08/06 18:37:43 INFO SocketReceiver: Closed socket to localhost:9999
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]