[
https://issues.apache.org/jira/browse/SPARK-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212062#comment-14212062
]
Andrew Ash commented on SPARK-625:
----------------------------------
Spark is very sensitive to hostnames in Spark URLs, and that comes from Akka
being very sensitive. I've personally been bitten by hostnames vs FQDNs vs
external IP address vs loopback IP address, and it's really a pain.
On current master branch (1.2) with the Spark standalone master listening on
{{spark://aash-mbp.local:7077}} as confirmed by the master web UI, and the
spark shell attempting to connect to {{spark://127.0.01:7077}} with the
{{--master}} parameter, the driver tries 3 attempts and then fails with this
message:
{noformat}
14/11/14 01:37:56 INFO AppClient$ClientActor: Connecting to master
spark://127.0.0.1:7077...
14/11/14 01:37:56 WARN AppClient$ClientActor: Could not connect to
akka.tcp://[email protected]:7077: akka.remote.InvalidAssociation: Invalid
address: akka.tcp://[email protected]:7077
14/11/14 01:37:56 WARN Remoting: Tried to associate with unreachable remote
address [akka.tcp://[email protected]:7077]. Address is now gated for 5000
ms, all messages to this address will be delivered to dead letters. Reason:
Connection refused: /127.0.0.1:7077
14/11/14 01:38:16 INFO AppClient$ClientActor: Connecting to master
spark://127.0.0.1:7077...
14/11/14 01:38:16 WARN Remoting: Tried to associate with unreachable remote
address [akka.tcp://[email protected]:7077]. Address is now gated for 5000
ms, all messages to this address will be delivered to dead letters. Reason:
Connection refused: /127.0.0.1:7077
14/11/14 01:38:16 WARN AppClient$ClientActor: Could not connect to
akka.tcp://[email protected]:7077: akka.remote.InvalidAssociation: Invalid
address: akka.tcp://[email protected]:7077
14/11/14 01:38:36 INFO AppClient$ClientActor: Connecting to master
spark://127.0.0.1:7077...
14/11/14 01:38:36 WARN Remoting: Tried to associate with unreachable remote
address [akka.tcp://[email protected]:7077]. Address is now gated for 5000
ms, all messages to this address will be delivered to dead letters. Reason:
Connection refused: /127.0.0.1:7077
14/11/14 01:38:36 WARN AppClient$ClientActor: Could not connect to
akka.tcp://[email protected]:7077: akka.remote.InvalidAssociation: Invalid
address: akka.tcp://[email protected]:7077
14/11/14 01:38:56 ERROR SparkDeploySchedulerBackend: Application has been
killed. Reason: All masters are unresponsive! Giving up.
14/11/14 01:38:56 WARN SparkDeploySchedulerBackend: Application ID is not
initialized yet.
14/11/14 01:38:56 ERROR TaskSchedulerImpl: Exiting due to error from cluster
scheduler: All masters are unresponsive! Giving up.
{noformat}
So the hang seems to be gone and replaced with a reasonable 3x attempts and
fail.
[~joshrosen], short of changing Akka ourselves to make it less strict on exact
URL matches, is there anything else we can do for this ticket? I think we can
reasonably close as fixed.
> Client hangs when connecting to standalone cluster using wrong address
> ----------------------------------------------------------------------
>
> Key: SPARK-625
> URL: https://issues.apache.org/jira/browse/SPARK-625
> Project: Spark
> Issue Type: Bug
> Affects Versions: 0.7.0, 0.7.1, 0.8.0
> Reporter: Josh Rosen
> Priority: Minor
>
> I launched a standalone cluster on my laptop, connecting the workers to the
> master using my machine's public IP address (128.32.*.*:7077). If I try to
> connect spark-shell to the master using "spark://0.0.0.0:7077", it
> successfully brings up a Scala prompt but hangs when I try to run a job.
> From the standalone master's log, it looks like the client's messages are
> being dropped without the client discovering that the connection has failed:
> {code}
> 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message
> RegisterJob(JobDescription(Spark shell)) for non-local recipient
> akka://[email protected]:7077/user/Master at akka://[email protected].*.*:7077 local
> is akka://[email protected].*.*:7077
> 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message
> DaemonMsgWatch(Actor[akka://[email protected].*.*:57518/user/$a],Actor[akka://[email protected]:7077/user/Master])
> for non-local recipient akka://[email protected]:7077/remote at
> akka://[email protected].*.*:7077 local is akka://[email protected].*.*:7077
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]