[
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-5113:
-----------------------------
Target Version/s: (was: 1.3.0)
> Audit and document use of hostnames and IP addresses in Spark
> -------------------------------------------------------------
>
> Key: SPARK-5113
> URL: https://issues.apache.org/jira/browse/SPARK-5113
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Patrick Wendell
> Priority: Critical
>
> Spark has multiple network components that start servers and advertise their
> network addresses to other processes.
> We should go through each of these components and make sure they have
> consistent and/or documented behavior wrt (a) what interface(s) they bind to
> and (b) what hostname they use to advertise themselves to other processes. We
> should document this clearly and explain to people what to do in different
> cases (e.g. EC2, dockerized containers, etc).
> When Spark initializes, it will search for a network interface until it finds
> one that is not a loopback address. Then it will do a reverse DNS lookup for
> a hostname associated with that interface. Then the network components will
> use that hostname to advertise the component to other processes. That
> hostname is also the one used for the akka system identifier (akka supports
> only supplying a single name which it uses both as the bind interface and as
> the actor identifier). In some cases, that hostname is used as the bind
> hostname also (e.g. I think this happens in the connection manager and
> possibly akka) - which will likely internally result in a re-resolution of
> this to an IP address. In other cases (the web UI and netty shuffle) we seem
> to bind to all interfaces.
> The best outcome would be to have three configs that can be set on each
> machine:
> {code}
> SPARK_LOCAL_IP # Ip address we bind to for all services
> SPARK_INTERNAL_HOSTNAME # Hostname we advertise to remote processes within
> the cluster
> SPARK_EXTERNAL_HOSTNAME # Hostname we advertise to processes outside the
> cluster (e.g. the UI)
> {code}
> It's not clear how easily we can support that scheme while providing
> backwards compatibility. The last one (SPARK_EXTERNAL_HOSTNAME) is easy -
> it's just an alias for what is now SPARK_PUBLIC_DNS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]