I did a little more research about this. It looks like the worker started 
successfully, but on port 40294. This is shown in both log and master web UI.
The question is that in the log, the master akka.tcp is trying to connect to 
another different port (44017). Why?
Yong

From: java8...@hotmail.com
To: u...@spark.incubator.apache.org
Subject: Problem to run spark as standalone
Date: Mon, 27 Oct 2014 11:38:32 -0400




Hi, Spark Users:
I tried to test the spark in a standalone box, but faced an issue which I don't 
know what is the root cause. I basically followed exactly document of deploy 
spark in a standalone environment.
1) I check out spark source code of release 1.1.02) I build the spark with 
following command: ./make-distribution.sh -Pyarn -Phadoop-2.4 
-Dhadoop.version=2.4.0 -DskipTests, Succeeded.3) I make sure that I can ssh to 
the localhost as myself using ssh key.4) I run the sbin/start-all.sh, it looks 
fine, at least I saw 2 java processes running.5) When I run the following 
command: yzhang@yzhang-linux:/opt/spark-1.1.0-bin-hadoop2.4.0/bin$ 
./spark-shell --master spark://yzhang-linux:7077
I saw the following message, then the shell exits itself.
14/10/27 11:22:53 INFO repl.SparkILoop: Created spark context..Spark context 
available as sc.
scala> 14/10/27 11:23:13 INFO client.AppClient$ClientActor: Connecting to 
master spark://yzhang-linux:7077...14/10/27 11:23:33 INFO 
client.AppClient$ClientActor: Connecting to master 
spark://yzhang-linux:7077...14/10/27 11:23:53 ERROR 
cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All 
masters are unresponsive! Giving up.14/10/27 11:23:53 ERROR 
scheduler.TaskSchedulerImpl: Exiting due to error from cluster scheduler: All 
masters are unresponsive! Giving up.
Now, I check the log file, and found out the following message in the master 
log:
 14/10/27 11:22:53 ERROR remote.EndpointWriter: dropping message [class 
akka.actor.SelectChildName] for non-local recipient 
[Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at 
[akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are 
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:13 ERROR 
remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for 
non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving 
at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are 
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:33 ERROR 
remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for 
non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving 
at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are 
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:53 INFO master.Master: 
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing 
it.14/10/27 11:23:53 INFO master.Master: 
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing 
it.14/10/27 11:23:53 INFO actor.LocalActorRef: Message 
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from 
Actor[akka://sparkMaster/deadLetters] to 
Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.240.8%3A63348-2#1992401281]
 was not delivered. [1] dead letters encountered. This logging can be turned 
off or adjusted with configuration settings 'akka.log-dead-letters' and 
'akka.log-dead-letters-during-shutdown'.14/10/27 11:23:53 ERROR 
remote.EndpointWriter: AssociationError 
[akka.tcp://sparkMaster@yzhang-linux:7077] -> 
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]] 
[akka.remote.EndpointAssociationException: Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: 
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: 
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO 
master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, 
removing it.14/10/27 11:23:53 INFO master.Master: 
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing 
it.14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError 
[akka.tcp://sparkMaster@yzhang-linux:7077] -> 
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]] 
[akka.remote.EndpointAssociationException: Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: 
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: 
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 ERROR 
remote.EndpointWriter: AssociationError 
[akka.tcp://sparkMaster@yzhang-linux:7077] -> 
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]] 
[akka.remote.EndpointAssociationException: Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: 
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: 
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO 
master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, 
removing it.
Any reason why this is happening? The web UI of spark looks normal. There is no 
error message in the worker log. This is a standalone box, no firewall. The 
hostname and IP can be resolved by itself without any problem.
Thanks for your help.
Yong                                                                            
  

Reply via email to