[
https://issues.apache.org/jira/browse/SPARK-14266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-14266.
-------------------------------
Resolution: Invalid
Fix Version/s: (was: 1.4.1)
Target Version/s: (was: 1.4.1)
I'm going to close this as there are too many things this could be caused by,
most of which are configuration problems. All you show is that various services
can't communicate with each other which suggests a network config problem.
Also, read
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark befre
opening a JIRA. For example you should never set Blocker, or Target/Fix version.
> Association with remote system [akka.tcp://[email protected]:34047]
> has failed, address is now gated for [5000] ms. Reason is: [Association
> failed$
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-14266
> URL: https://issues.apache.org/jira/browse/SPARK-14266
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Spark Core
> Affects Versions: 1.4.1
> Environment: Ubuntu,
> Spark 1.4.1
> Python 2.7
> java version "1.7.0_95"
> OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)
> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
> Reporter: Pavan Kumar
> Priority: Blocker
>
> I have a spark standalone cluster configuration with 1 master and 3 slaves.
> Configuration in Master's spar-env.sh:
> export SPARK_PUBLIC_DNS="173.220.132.82"
> export SPARK_WORKER_CORES=6
> SPARK_MASTER_IP='192.168.1.81'
> SPARK_LOCAL_IP='192.168.1.81'
> Configuration in Master machine /conf/salves
> 192.168.1.82
> 192.168.1.83
> 192.168.1.84
> These are my 3 slaves.
> Now when trying to run
> ubuntu@MyCareerVM1:/usr/local/spark$ MASTER=spark://192.168.1.81:7077
> bin/pyspark
> It is continuously throwing error
> Error Logs from Master:
> ubuntu@MyCareerVM1:/usr/local/spark$ MASTER=spark://192.168.1.81:7077
> bin/pyspark
> Python 2.7.6 (default, Jun 22 2015, 17:58:13)
> [GCC 4.8.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> 16/03/29 09:16:48 INFO SparkContext: Running Spark version 1.4.1
> 16/03/29 09:16:48 WARN NativeCodeLoader: Unable to load native-hadoop library
> for your platform... using builtin-java classes where applicable
> 16/03/29 09:16:49 INFO SecurityManager: Changing view acls to: ubuntu
> 16/03/29 09:16:49 INFO SecurityManager: Changing modify acls to: ubuntu
> 16/03/29 09:16:49 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(ubuntu); users
> with modify permissions: Set(ubuntu)
> 16/03/29 09:16:49 INFO Slf4jLogger: Slf4jLogger started
> 16/03/29 09:16:50 INFO Remoting: Starting remoting
> 16/03/29 09:16:50 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://[email protected]:34901]
> 16/03/29 09:16:50 INFO Utils: Successfully started service 'sparkDriver' on
> port 34901.
> 16/03/29 09:16:50 INFO SparkEnv: Registering MapOutputTracker
> 16/03/29 09:16:50 INFO SparkEnv: Registering BlockManagerMaster
> 16/03/29 09:16:50 INFO DiskBlockManager: Created local directory at
> /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/blockmgr-a9e868af-4253-4230-9227-948fbb8a0d91
> 16/03/29 09:16:50 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
> 16/03/29 09:16:50 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-a77016c9-a9ae-49c5-908f-fc540dc7d3ff/httpd-a78e633c-0ae7-46cf-81e8-776d8f7c3c46
> 16/03/29 09:16:50 INFO HttpServer: Starting HTTP Server
> 16/03/29 09:16:50 INFO Utils: Successfully started service 'HTTP file server'
> on port 34364.
> 16/03/29 09:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/03/29 09:16:50 INFO Utils: Successfully started service 'SparkUI' on port
> 4040.
> 16/03/29 09:16:50 INFO SparkUI: Started SparkUI at http://173.220.132.82:4040
> 16/03/29 09:16:50 INFO AppClient$ClientActor: Connecting to master
> akka.tcp://[email protected]:7077/user/Master...
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Connected to Spark
> cluster with app ID app-20160329091651-0006
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added:
> app-20160329091651-0006/0 on worker-20160329072744-192.168.1.84-45492
> (192.168.1.84:45492) with 6 cores
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20160329091651-0006/0 on hostPort 192.168.1.84:45492 with 6 cores, 512.0
> MB RAM
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added:
> app-20160329091651-0006/1 on worker-20160329072744-192.168.1.82-45482
> (192.168.1.82:45482) with 6 cores
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20160329091651-0006/1 on hostPort 192.168.1.82:45482 with 6 cores, 512.0
> MB RAM
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor added:
> app-20160329091651-0006/2 on worker-20160329072746-192.168.1.83-38065
> (192.168.1.83:38065) with 6 cores
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20160329091651-0006/2 on hostPort 192.168.1.83:38065 with 6 cores, 512.0
> MB RAM
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/2 is now LOADING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/1 is now LOADING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/0 is now LOADING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/0 is now RUNNING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/1 is now RUNNING
> 16/03/29 09:16:51 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/2 is now RUNNING
> 16/03/29 09:16:51 INFO Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42458.
> 16/03/29 09:16:51 INFO NettyBlockTransferService: Server created on 42458
> 16/03/29 09:16:51 INFO BlockManagerMaster: Trying to register BlockManager
> 16/03/29 09:16:51 INFO BlockManagerMasterEndpoint: Registering block manager
> 192.168.1.81:42458 with 265.4 MB RAM, BlockManagerId(driver, 192.168.1.81,
> 42458)
> 16/03/29 09:16:51 INFO BlockManagerMaster: Registered BlockManager
> 16/03/29 09:16:51 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
> Welcome to
> ____ __
> / __/__ ___ _____/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /__ / .__/\_,_/_/ /_/\_\ version 1.4.1
> /_/
> Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
> SparkContext available as sc, HiveContext available as sqlContext.
> >>> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated:
> >>> app-20160329091651-0006/0 is now EXITED (Command exited with code 1)
> 16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor
> app-20160329091651-0006/0 removed: Command exited with code 1
> 16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove
> non-existent executor 0
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added:
> app-20160329091651-0006/3 on worker-20160329072744-192.168.1.84-45492
> (192.168.1.84:45492) with 6 cores
> 16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20160329091651-0006/3 on hostPort 192.168.1.84:45492 with 6 cores, 512.0
> MB RAM
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/2 is now EXITED (Command exited with code 1)
> 16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Executor
> app-20160329091651-0006/2 removed: Command exited with code 1
> 16/03/29 09:16:53 ERROR SparkDeploySchedulerBackend: Asked to remove
> non-existent executor 2
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor added:
> app-20160329091651-0006/4 on worker-20160329072746-192.168.1.83-38065
> (192.168.1.83:38065) with 6 cores
> 16/03/29 09:16:53 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20160329091651-0006/4 on hostPort 192.168.1.83:38065 with 6 cores, 512.0
> MB RAM
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/3 is now LOADING
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/4 is now LOADING
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/3 is now RUNNING
> 16/03/29 09:16:53 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/4 is now RUNNING
> 16/03/29 09:16:54 INFO AppClient$ClientActor: Executor updated:
> app-20160329091651-0006/1 is now EXITED (Command exited with code 1)
> 16/03/29 09:16:54 INFO SparkDeploySchedulerBackend: Executor
> app-20160329091651-0006/1 removed: Command exited with code 1
> 16/03/29 09:16:54 ERROR SparkDeploySchedulerBackend: Asked to remove
> non-existent executor 1
> These are the logs from my slaves:
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 16/03/30 01:50:18 INFO CoarseGrainedExecutorBackend: Registered signal
> handlers for [TERM, HUP, INT]
> 16/03/30 01:50:19 WARN NativeCodeLoader: Unable to load native-hadoop library
> for your platform... using builtin-java classes where applicable
> 16/03/30 01:50:19 INFO SecurityManager: Changing view acls to: ubuntu
> 16/03/30 01:50:19 INFO SecurityManager: Changing modify acls to: ubuntu
> 16/03/30 01:50:19 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(ubuntu); users
> with modify permissions: Set(ubuntu)
> 16/03/30 01:50:20 INFO Slf4jLogger: Slf4jLogger started
> 16/03/30 01:50:20 INFO Remoting: Starting remoting
> 16/03/30 01:50:20 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://[email protected]:38333]
> 16/03/30 01:50:20 INFO Utils: Successfully started service
> 'driverPropsFetcher' on port 38333.
> 16/03/30 01:50:20 WARN ReliableDeliverySupervisor: Association with remote
> system [akka.tcp://[email protected]:34047] has failed, address is now
> gated for [5000] ms. Reason is: [Association failed$
> Exception in thread "main" akka.actor.ActorNotFound: Actor not found for:
> ActorSelection[Anchor(akka.tcp://[email protected]:34047/),
> Path(/user/CoarseGrainedScheduler)]
> at
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
> at
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> at
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> at
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
> at
> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
> at
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
> at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
> at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
> at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
> at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
> at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
> at
> akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
> at akka.remote.EndpointWriter.postStop(Endpoint.scala:561)
> at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
> at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:415)
> at
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
> at
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
> at akka.actor.ActorCell.terminate(ActorCell.scala:369)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 16/03/30 01:50:20 INFO Utils: Shutdown hook called
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]