[
https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Collins updated HDFS-3150:
------------------------------
Description:
The DN listens on multiple IP addresses (the default {{dfs.datanode.address}}
is the wildcard) however per HADOOP-6867 only the source address (IP) of the
registration is given to clients. HADOOP-985 made clients access datanodes by
IP primarily to avoid the latency of a DNS lookup, this had the side effect of
breaking DN multihoming (the client can not route the IP exposed by the NN if
the DN registers with an interface that has a cluster-private IP). To fix this
let's add back the option for Datanodes to be accessed by hostname.
This can be done by:
# Modifying the primary field of the Datanode descriptor to be the hostname, or
# Modifying Client/Datanode <-> Datanode access use the hostname field instead
of the IP
Approach #2 does not require an incompatible client protocol change, and is
much less invasive. It minimizes the scope of modification to just places where
clients and Datanodes connect, vs changing all uses of Datanode identifiers.
New client and Datanode configuration options are introduced:
- {{dfs.client.use.datanode.hostname}} indicates all client to datanode
connections should use the datanode hostname (as clients outside cluster may
not be able to route the IP)
- {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use
hostnames when connecting to other Datanodes for data transfer
If the configuration options are not used, there is no change in the current
behavior.
was:
Per the document attached to HADOOP-8198, this is just for branch-1, and
unbreaks DN multihoming. The datanode can be configured to listen on a bond, or
all interfaces by specifying the wildcard in the dfs.datanode.*.address
configuration options, however per HADOOP-6867 only the source address of the
registration is exposed to clients. HADOOP-985 made clients access datanodes by
IP primarily to avoid the latency of a DNS lookup, this had the side effect of
breaking DN multihoming. In order to fix it let's add back the option for
Datanodes to be accessed by hostname. This can be done by:
# Modifying the primary field of the Datanode descriptor to be the hostname, or
# Modifying Client/Datanode <-> Datanode access use the hostname field instead
of the IP
I'd like to go with approach #2 as it does not require making an incompatible
change to the client protocol, and is much less invasive. It minimizes the
scope of modification to just places where clients and Datanodes connect, vs
changing all uses of Datanode identifiers.
New client and Datanode configuration options are introduced:
- {{dfs.client.use.datanode.hostname}} indicates all client to datanode
connections should use the datanode hostname (as clients outside cluster may
not be able to route the IP)
- {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use
hostnames when connecting to other Datanodes for data transfer
If the configuration options are not used, there is no change in the current
behavior.
I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the
use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based
on the context the ID is being used in, vs always using the IP:xferPort as the
Datanode's name, and using the name everywhere.
Target Version/s: 2.2.0-alpha
Affects Version/s: 1.0.0
2.0.0-alpha
Summary: Add option for clients to contact DNs via hostname
(was: Add option for clients to contact DNs via hostname in branch-1)
> Add option for clients to contact DNs via hostname
> --------------------------------------------------
>
> Key: HDFS-3150
> URL: https://issues.apache.org/jira/browse/HDFS-3150
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: data-node, hdfs client
> Affects Versions: 1.0.0, 2.0.0-alpha
> Reporter: Eli Collins
> Assignee: Eli Collins
> Fix For: 1.1.0
>
> Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt
>
>
> The DN listens on multiple IP addresses (the default {{dfs.datanode.address}}
> is the wildcard) however per HADOOP-6867 only the source address (IP) of the
> registration is given to clients. HADOOP-985 made clients access datanodes by
> IP primarily to avoid the latency of a DNS lookup, this had the side effect
> of breaking DN multihoming (the client can not route the IP exposed by the NN
> if the DN registers with an interface that has a cluster-private IP). To fix
> this let's add back the option for Datanodes to be accessed by hostname.
> This can be done by:
> # Modifying the primary field of the Datanode descriptor to be the hostname,
> or
> # Modifying Client/Datanode <-> Datanode access use the hostname field
> instead of the IP
> Approach #2 does not require an incompatible client protocol change, and is
> much less invasive. It minimizes the scope of modification to just places
> where clients and Datanodes connect, vs changing all uses of Datanode
> identifiers.
> New client and Datanode configuration options are introduced:
> - {{dfs.client.use.datanode.hostname}} indicates all client to datanode
> connections should use the datanode hostname (as clients outside cluster may
> not be able to route the IP)
> - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should
> use hostnames when connecting to other Datanodes for data transfer
> If the configuration options are not used, there is no change in the current
> behavior.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira