[ 
https://issues.apache.org/jira/browse/HADOOP-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HADOOP-6867.
-----------------------------------------

    Resolution: Not a Problem

I believe this is not a problem anymore after other JIRAs such as HDFS-4963.  
Please feel free to reopen this if it is not the case.  Resolving ...

> Using socket address for datanode registry breaks multihoming
> -------------------------------------------------------------
>
>                 Key: HADOOP-6867
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6867
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>         Environment: hadoop-0.20-0.20.2+228-1, centos 5, distcp
>            Reporter: Jordan Sissel
>
> Related: 
> * https://issues.apache.org/jira/browse/HADOOP-985
> * https://issues.apache.org/jira/secure/attachment/12350813/HADOOP-985-1.patch
> * http://old.nabble.com/public-IP-for-datanode-on-EC2-td19336240.html
> * 
> http://www.cloudera.com/blog/2008/12/securing-a-hadoop-cluster-through-a-gateway/
>  
> Datanodes register using their dns name (even configurable with 
> dfs.datanode.dns.interface). However, the Namenode only really uses the 
> source address that the registration came from when sharing it to clients 
> wanting to write to HDFS.
> Specific environment that causes this problem:
> * Datanode and Namenode multihomed on two networks.
> * Datanode registers to namenode using dns name on network #1
> * Client (distcp) connects to namenode on network #2 \(*) and is told to 
> write to datanodes on network #1, which doesn't work for us.
> \(*) Allowing contact to the namenode on multiple networks was achieved with 
> a socat proxy hack that tunnels network#2 to network#1 port 8020. This is 
> unrelated to the issue at hand.
> The cloudera link above recommends proxying for other reasons than 
> multihoming, but it would work, but it doesn't sound like it would well 
> (bandwidth, multiplicity, multitenant, etc).
> Our specific scenario is wanting to distcp over a different network interface 
> than the datanodes register themselves on, but it would be nice if both (all) 
> interfaces worked. We are internally going to patch hadoop to roll back parts 
> of the patch mentioned above so that we rely the datanode name rather than 
> the socket address it uses to talk to the namenode. The alternate option is 
> to push config changes to all nodes that force them to listen/register on one 
> specific interface only. This helps us work around our specific problem, but 
> doesn't really help with multihoming. 
> I would propose that datanodes register all interface addresses during the 
> registration/heartbeat/whatever process does this and hdfs clients would be 
> given all addresses for a specific node to perform operations against and 
> they could select accordingly (or 'whichever worked first') just like 
> round-robin dns does.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to