[ 
https://issues.apache.org/jira/browse/HADOOP-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372935#comment-16372935
 ] 

Greg Senia commented on HADOOP-15250:
-------------------------------------

These parameters the two DNS ones would never work in a splitview DNS 
environment with DNS servers correctly configured to determine the hostsname. 
Remember in our scenerio we want all local traffic to the cluster to use the 
cluster network and anything destined to the datacenter networks to use the 
server interfaces. This means you should NOT be binding to a non-routable 
address. This also raises the question that anyone putting in values to 
/etc/hosts would cause that to bind incorrectly say like 127.0.0.10. Hadoop 
should be relying on the OS for DNS and IP routing information like other 
software stacks in the middleware space do. So I guess the question is. Is 
Hadoop going to support this multi-homing configuration which is similar to the 
SDN/Docker setup and our setup here. Nothing in the HWX articles state that it 
won't:
https://community.hortonworks.com/articles/24277/parameters-for-multi-homing.html


hadoop.security.dns.nameserver

The host name or IP address of the name server (DNS) which a service Node 
should use to determine its own host name for Kerberos Login. Requires 
hadoop.security.dns.interface. Most clusters will not require this setting.

hadoop.security.dns.interface 
 The name of the Network Interface from which the service should determine its 
host name for Kerberos login. e.g. eth2. In a multi-homed environment, the 
setting can be used to affect the _HOST substitution in the service Kerberos 
principal. If this configuration value is not set, the service will use its 
default hostname as returned by 
InetAddress.getLocalHost().getCanonicalHostName(). Most clusters will not 
require this setting.

In regards to hadoop.rpc.protection shouldn't this be what is guarding against 
the man in the middle not a null check on if a hostname has an ip asscoaited 
with to bind outbound???

hadoop.rpc.protection privacy
 A comma-separated list of protection values for secured sasl connections. 
Possible values are authentication, integrity and privacy. authentication means 
authentication only and no integrity or privacy; integrity implies 
authentication and integrity are enabled; and privacy implies all of 
authentication, integrity and privacy are enabled. 
hadoop.security.saslproperties.resolver.class can be used to override the 
hadoop.rpc.protection for a connection at the server side.

Here are all of our values for binding to support multi-homed networks per the 
documentation. Unforunately using the DNS options is not a valid solution with 
our network design. We did our due diligence spent almost a week forumulating a 
solution to this problem do not just assume we didn't set the parameters.

 

core-site.xml:
ipc.client.nobind.local.addr=true
hadoop.rpc.protection=privacy

hdfs-site.xml:
dfs.client.use.datanode.hostname=true
dfs.datanode.use.datanode.hostname=true
dfs.namenode.http-bind-host=0.0.0.0
dfs.namenode.https-bind-host=0.0.0.0
dfs.namenode.rpc-bind-host=0.0.0.0
dfs.namenode.lifeline.rpc-bind-host=0.0.0.0
dfs.namenode.servicerpc-bind-host=0.0.0.0
dfs.datanode.address=0.0.0.0:1019
dfs.datanode.http.address=0.0.0.0:1022
dfs.datanode.https.address=0.0.0.0:50475
dfs.datanode.ipc.address=0.0.0.0:8010
dfs.journalnode.http-address=0.0.0.0:8480
dfs.journalnode.https-address=0.0.0.0:8481
dfs.namenode.http-address.tech.nn1=ha21t51nn.tech.hdp.example.com:50070
dfs.namenode.http-address.tech.nn2=ha21t52nn.tech.hdp.example.com:50070
dfs.namenode.http-address.unit.nn1=ha21d51nn.unit.hdp.example.com:50070
dfs.namenode.http-address.unit.nn2=ha21d52nn.unit.hdp.example.com:50070
dfs.namenode.https-address.tech.nn1=ha21t51nn.tech.hdp.example.com:50470
dfs.namenode.https-address.tech.nn2=ha21t52nn.tech.hdp.example.com:50470
dfs.namenode.lifeline.rpc-address.tech.nn1=ha21t51nn.tech.hdp.example.com:8050
dfs.namenode.lifeline.rpc-address.tech.nn2=ha21t52nn.tech.hdp.example.com:8050
dfs.namenode.rpc-address.tech.nn1=ha21t51nn.tech.hdp.example.com:8020
dfs.namenode.rpc-address.tech.nn2=ha21t52nn.tech.hdp.example.com:8020
dfs.namenode.rpc-address.unit.nn1=ha21d51nn.unit.hdp.example.com:8020
dfs.namenode.rpc-address.unit.nn2=ha21d52nn.unit.hdp.example.com:8020
dfs.namenode.servicerpc-address.tech.nn1=ha21t51nn.tech.hdp.example.com:8040
dfs.namenode.servicerpc-address.tech.nn2=ha21t52nn.tech.hdp.example.com:8040
dfs.namenode.servicerpc-address.unit.nn1=ha21d51nn.unit.hdp.example.com:8040
dfs.namenode.servicerpc-address.unit.nn2=ha21d52nn.unit.hdp.example.com:8040

hbase-site.xml:
hbase.master.ipc.address=0.0.0.0
hbase.regionserver.ipc.address=0.0.0.0
hbase.master.info.bindAddress=0.0.0.0

mapred-site.xml:
mapreduce.jobhistory.bind-host=0.0.0.0
mapreduce.jobhistory.address=ha21t52mn.tech.hdp.example.com:10020
mapreduce.jobhistory.webapp.address=ha21t52mn.tech.hdp.example.com:19888

 

yarn-site.xml:
yarn.nodemanager.bind-host=0.0.0.0
yarn.resourcemanager.bind-host=0.0.0.0
yarn.timeline-service.bind-host=0.0.0.0
yarn.nodemanager.address=0.0.0.0:45454
yarn.resourcemanager.address=ha21t52mn.tech.hdp.example.com:8050
yarn.resourcemanager.admin.address=ha21t52mn.tech.hdp.example.com:8141
yarn.resourcemanager.resource-tracker.address=ha21t52mn.tech.hdp.example.com:8025
yarn.resourcemanager.scheduler.address=ha21t52mn.tech.hdp.example.com:8030
yarn.resourcemanager.webapp.address=ha21t52mn.tech.hdp.example.com:8088
yarn.resourcemanager.webapp.address.rm1=ha21t52mn.tech.hdp.example.com:8088
yarn.resourcemanager.webapp.address.rm2=ha21t53mn.tech.hdp.example.com:8088
yarn.resourcemanager.webapp.https.address=ha21t52mn.tech.hdp.example.com:8090
yarn.resourcemanager.webapp.https.address.rm1=ha21t52mn.tech.hdp.example.com:8090
yarn.resourcemanager.webapp.https.address.rm2=ha21t53mn.tech.hdp.example.com:8090
yarn.resourcemanager.zk-address=ha21t51mn.tech.hdp.example.com:2181,ha21t52mn.tech.hdp.example.com:2181,ha21t53mn.tech.hdp.example.com:2181
yarn.timeline-service.address=ha21t53mn.tech.hdp.example.com:10200
yarn.timeline-service.webapp.address=ha21t53mn.tech.hdp.example.com:8188
yarn.timeline-service.webapp.https.address=ha21t53mn.tech.hdp.example.com:8190

> Split-DNS MultiHomed Server Network Cluster Network IPC Client Bind Addr Wrong
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15250
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15250
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc, net
>    Affects Versions: 2.7.3, 2.9.0, 3.0.0
>         Environment: Multihome cluster with split DNS and rDNS lookup of 
> localhost returning non-routable IPAddr
>            Reporter: Greg Senia
>            Priority: Critical
>         Attachments: HADOOP-15250.patch
>
>
> We run our Hadoop clusters with two networks attached to each node. These 
> network are as follows a server network that is firewalled with firewalld 
> allowing inbound traffic: only SSH and things like Knox and Hiveserver2 and 
> the HTTP YARN RM/ATS and MR History Server. The second network is the cluster 
> network on the second network interface this uses Jumbo frames and is open no 
> restrictions and allows all cluster traffic to flow between nodes. 
>  
> To resolve DNS within the Hadoop Cluster we use DNS Views via BIND so if the 
> traffic is originating from nodes with cluster networks we return the 
> internal DNS record for the nodes. This all works fine with all the 
> multi-homing features added to Hadoop 2.x
>  Some logic around views:
> a. The internal view is used by cluster machines when performing lookups. So 
> hosts on the cluster network should get answers from the internal view in DNS
> b. The external view is used by non-local-cluster machines when performing 
> lookups. So hosts not on the cluster network should get answers from the 
> external view in DNS
>  
> So this brings me to our problem. We created some firewall rules to allow 
> inbound traffic from each clusters server network to allow distcp to occur. 
> But we noticed a problem almost immediately that when YARN attempted to talk 
> to the Remote Cluster it was binding outgoing traffic to the cluster network 
> interface which IS NOT routable. So after researching the code we noticed the 
> following in NetUtils.java and Client.java 
> Basically in Client.java it looks as if it takes whatever the hostname is and 
> attempts to bind to whatever the hostname is resolved to. This is not valid 
> in a multi-homed network with one routable interface and one non routable 
> interface. After reading through the java.net.Socket documentation it is 
> valid to perform socket.bind(null) which will allow the OS routing table and 
> DNS to send the traffic to the correct interface. I will also attach the 
> nework traces and a test patch for 2.7.x and 3.x code base. I have this test 
> fix below in my Hadoop Test Cluster.
> Client.java:
>       
> |/*|
> | | * Bind the socket to the host specified in the principal name of the|
> | | * client, to ensure Server matching address of the client connection|
> | | * to host name in principal passed.|
> | | */|
> | |InetSocketAddress bindAddr = null;|
> | |if (ticket != null && ticket.hasKerberosCredentials()) {|
> | |KerberosInfo krbInfo =|
> | |remoteId.getProtocol().getAnnotation(KerberosInfo.class);|
> | |if (krbInfo != null) {|
> | |String principal = ticket.getUserName();|
> | |String host = SecurityUtil.getHostFromPrincipal(principal);|
> | |// If host name is a valid local address then bind socket to it|
> | |{color:#FF0000}*InetAddress localAddr = 
> NetUtils.getLocalInetAddress(host);*{color}|
> |{color:#FF0000} ** {color}|if (localAddr != null) {|
> | |this.socket.setReuseAddress(true);|
> | |if (LOG.isDebugEnabled()) {|
> | |LOG.debug("Binding " + principal + " to " + localAddr);|
> | |}|
> | |*{color:#FF0000}bindAddr = new InetSocketAddress(localAddr, 0);{color}*|
> | *{color:#FF0000}{color}* |*{color:#FF0000}}{color}*|
> | |}|
> | |}|
>  
> So in my Hadoop 2.7.x Cluster I made the following changes and traffic flows 
> correctly out the correct interfaces:
>  
> diff --git 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
>  
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
> index e1be271..c5b4a42 100644
> --- 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
> +++ 
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
> @@ -305,6 +305,9 @@
>    public static final String  IPC_CLIENT_FALLBACK_TO_SIMPLE_AUTH_ALLOWED_KEY 
> = "ipc.client.fallback-to-simple-auth-allowed";
>    public static final boolean 
> IPC_CLIENT_FALLBACK_TO_SIMPLE_AUTH_ALLOWED_DEFAULT = false;
>  
> +  public static final String  IPC_CLIENT_NO_BIND_LOCAL_ADDR_KEY = 
> "ipc.client.nobind.local.addr";
> +  public static final boolean IPC_CLIENT_NO_BIND_LOCAL_ADDR_DEFAULT = false;
> +
>    public static final String IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SASL_KEY =
>      "ipc.client.connect.max.retries.on.sasl";
>    public static final int    IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SASL_DEFAULT 
> = 5;
> diff --git 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
>  
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
> index a6f4eb6..7bfddb7 100644
> --- 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
> +++ 
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
> @@ -129,7 +129,9 @@ public static void setCallIdAndRetryCount(int cid, int 
> rc) {
>  
>    private final int connectionTimeout;
>  
> +
>    private final boolean fallbackAllowed;
> +  private final boolean noBindLocalAddr;
>    private final byte[] clientId;
>    
>    final static int CONNECTION_CONTEXT_CALL_ID = -3;
> @@ -642,7 +644,11 @@ private synchronized void setupConnection() throws 
> IOException {
>                InetAddress localAddr = NetUtils.getLocalInetAddress(host);
>                if (localAddr != null) {
>                  this.socket.setReuseAddress(true);
> -                this.socket.bind(new InetSocketAddress(localAddr, 0));
> +                if (noBindLocalAddr) {
> +                  this.socket.bind(null);
> + } else {
> +                  this.socket.bind(new InetSocketAddress(localAddr, 0));
> +                }
>                }
>              }
>            }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to