[ 
https://issues.apache.org/jira/browse/SPARK-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241951#comment-14241951
 ] 

Ilayaperumal Gopinathan commented on SPARK-2892:
------------------------------------------------

To add more info:

When the ReceiverTracker sends the "StopReceiver" message to the receiver actor 
at the executor, it's ReceiverLauncher thread always times out and I notice the 
corresponding job is cancelled only because of stopping the DAGScheduler. This 
throws the exception[1] while at the executor side the worker node throws this 
info[2]

Exception[1]:
INFO sparkDriver-akka.actor.default-dispatcher-14 
cluster.SparkDeploySchedulerBackend - Asking each executor to shut down
15:06:53,783 1.1.0.SNAP  INFO Thread-40 scheduler.DAGScheduler - Job 1 failed: 
start at SparkDriver.java:109, took 72.739141 s
Exception in thread "Thread-40" org.apache.spark.SparkException: Job cancelled 
because SparkContext was shut down
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428)
        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundPostStop(DAGScheduler.scala:1375)
        at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
        at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
        at akka.actor.ActorCell.terminate(ActorCell.scala:369)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

info[2]
INFO LocalActorRef: Message 
[akka.remote.transport.AssociationHandle$Disassociated] from 
Actor[akka://sparkWorker/deadLetters] to 
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40127.0.0.1%3A52219-2#1619424601]
 was not delivered. [1] dead letters encountered. This logging can be turned 
off or adjusted with configuration settings 'akka.log-dead-letters' and 
'akka.log-dead-letters-during-shutdown'.
14/12/10 15:06:53 ERROR EndpointWriter: AssociationError 
[akka.tcp://sparkWorker@localhost:51262] <- 
[akka.tcp://sparkExecutor@localhost:52217]: Error [Shut down address: 
akka.tcp://sparkExecutor@localhost:52217] [
akka.remote.ShutDownAssociation: Shut down address: 
akka.tcp://sparkExecutor@localhost:52217
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The 
remote system terminated the association because it is shutting down.

Please note that I am running everything on localhost and the same thing works 
fine on "local" mode and the above issue only arises on "cluster" mode. I tried 
changing the hostname to 127.0.0.1 but noticed the same.

Any clues on what might be going on here would help a lot.
Thanks!

> Socket Receiver does not stop when streaming context is stopped
> ---------------------------------------------------------------
>
>                 Key: SPARK-2892
>                 URL: https://issues.apache.org/jira/browse/SPARK-2892
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.0.2
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Critical
>
> Running NetworkWordCount with
> {quote}      
> ssc.start(); Thread.sleep(10000); ssc.stop(stopSparkContext = false); 
> Thread.sleep(60000)
> {quote}
> gives the following error
> {quote}
> 14/08/06 18:37:13 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
> in 10047 ms on localhost (1/1)
> 14/08/06 18:37:13 INFO DAGScheduler: Stage 0 (runJob at 
> ReceiverTracker.scala:275) finished in 10.056 s
> 14/08/06 18:37:13 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
> have all completed, from pool
> 14/08/06 18:37:13 INFO SparkContext: Job finished: runJob at 
> ReceiverTracker.scala:275, took 10.179263 s
> 14/08/06 18:37:13 INFO ReceiverTracker: All of the receivers have been 
> terminated
> 14/08/06 18:37:13 WARN ReceiverTracker: All of the receivers have not 
> deregistered, Map(0 -> 
> ReceiverInfo(0,SocketReceiver-0,null,false,localhost,Stopped by driver,))
> 14/08/06 18:37:13 INFO ReceiverTracker: ReceiverTracker stopped
> 14/08/06 18:37:13 INFO JobGenerator: Stopping JobGenerator immediately
> 14/08/06 18:37:13 INFO RecurringTimer: Stopped timer for JobGenerator after 
> time 1407375433000
> 14/08/06 18:37:13 INFO JobGenerator: Stopped JobGenerator
> 14/08/06 18:37:13 INFO JobScheduler: Stopped JobScheduler
> 14/08/06 18:37:13 INFO StreamingContext: StreamingContext stopped successfully
> 14/08/06 18:37:43 INFO SocketReceiver: Stopped receiving
> 14/08/06 18:37:43 INFO SocketReceiver: Closed socket to localhost:9999
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to