George Hawkins created SPARK-19657:
--------------------------------------
Summary: start-master.sh accidentally forces the use of a loopback
address in master URL
Key: SPARK-19657
URL: https://issues.apache.org/jira/browse/SPARK-19657
Project: Spark
Issue Type: Bug
Components: Deploy
Affects Versions: 2.1.0
Environment: Ubuntu 16.04
Reporter: George Hawkins
{{start-master.sh}} contains the line:
{noformat}
SPARK_MASTER_HOST="`hostname -f`"
{noformat}
{{\-f}} means get the FQDN - the assumption seems to be that this will always
return a public IP address (note that if {{start-master.sh}} didn't force the
hostname by specifying {{--host}} then the default behavior of {{Master}} is to
sensibly default to a public IP).
I came across this when I started a master and it output:
{noformat}
17/02/16 23:03:32 INFO Master: Starting Spark master at spark://myhostname:7077
{noformat}
But my external slaves could not connect to this URL and I was mystified when
on the master machine (with just one public IP address) the following both
failed:
{noformat}
$ telnet 192.168.1.133 7077
$ telnet 127.0.0.1 7077
{noformat}
{{192.168.1.133}} was the machine's public IP address and {{Master}} seemed to
be listening on neither the public IP address nor the loopback address. However
the following worked:
{noformat}
$ telnet myhostname 7077
{noformat}
It turns out this is a quirk of Debian and Ubuntu systems - the hostname maps
to a loopback address but not to the well known one {{127.0.0.1}}.
If you look in {{/etc/hosts}} you see:
{noformat}
127.0.0.1 localhost
127.0.1.1 myhostname
{noformat}
I looked at this many times before I noticed that it's not the same IP address
on both lines (I never knew that the entire {{127.0.0.0/8}} address block is
reserved for loopback purposes - see
[localhost|https://en.wikipedia.org/wiki/Localhost] on Wikipedia).
Why do Debian and Ubuntu do this? It seems there was a good and explained
reason for this way back in time - the {{127.0.1.1}} line used to always map to
an FQDN, i.e. you'd expect to see:
{noformat}
127.0.0.1 localhost
127.0.1.1 myhostname.some.domain
{noformat}
The Debian reference manual used to include the following section:
{quote}
Some software (e.g., GNOME) expects the system hostname to be resolvable to an
IP address with a canonical fully qualified domain name. This is really
improper because system hostnames and domain names are two very different
things; but there you have it. In order to support that software, it is
necessary to ensure that the system hostname can be resolved.
{quote}
However the [hostname resolution
section|https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution]
in the current reference, while still mentioning issues with software like
GNOME, no longer says that the {{127.0.1.1.}} entry will be an FQDN.
In this [bug report|https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719621]
you can see them discussing the change in documentation, i.e. removing the
statement that {{127.0.1.1}} always maps to an FQDN, but there's no explanation
of the reason for the change (the stated original purpose of this entry in
{{/etc/hosts}} seems to be lost by this change, so it seems odd not to explain
it).
So while it may be uncommon that a Spark master doesn't have a static IP and an
FQDN, in a real cluster setup, this setup is probably quite likely for people
getting started with Spark - i.e. starting the master on their personal machine
running Ubuntu on a network that uses DHCP. And it's quite confusing to find
that {{start-master.sh}} has started the master on an address that isn't
externally accessable (and it isn't immediately obvious from the master URL
that this is the case).
The simple solution seems to be simply not to specify the {{--host}} argument
in {{spark-master.sh}} unless the length of {{$SPARK_MASTER_HOST}} in non-zero.
In this case the Spark logic (working in the Java/Scala world where it's far
easier to query IP addresses, check if they're loopback addresses etc.) already
works out a sensible default public IP address to use.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]