[ 
https://issues.apache.org/jira/browse/HBASE-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049753#comment-15049753
 ] 

Yong Zheng commented on HBASE-14958:
------------------------------------

Thanks for Nick so prompt response. 

After checking the prerequisites, DNS can't solve the issue. 

in my virtualized hbase cluster, it has only 4 nodes: 
n03docker1(172.17.1.2)
n03docker2(172.17.1.3)

n04docker1(172.17.2.2)
n04docker2(172.17.2.3)

DNS is not configured but I configured /etc/hosts:
# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

172.17.1.1   c3m3n03docker.gpfs.net c3m3n03docker            <== the br0 on the 
physical node c3m3n03
172.17.2.1   c3m3n04docker.gpfs.net c3m3n04docker             <== the br0 on 
the physical node c3m3n04

172.17.1.2   n03docker1.gpfs.net n03docker1
172.17.1.3   n03docker2.gpfs.net n03docker2
172.17.2.2   n04docker1.gpfs.net n04docker1
172.17.2.3   n04docker2.gpfs.net n04docker2

so, DNS resolution works(I do see the correct name for n03docker1 and 
n03docker2). However, for any region servers located over other physical 
machines, all network packet from those region servers  will be source NATed 
with the IP of c3m3n04(192.168.3.114)(that means, all IP packet will be changed 
with the source IP as 192.168.3.114. so that these packets can be transferred 
to the physical node c3m3n03).

for hbase master, 192.168.3.113 or 192.168.3.114 are invisible for hbase. thus, 
DNS resolution for 192.168.3.114 inside VM doesn't help this.  e.g. 
192.168.3.114's hostname should be c3m3n04, not n04docker1 or n04docker2.
if we configure DNS inside VM to map 192.168.3.114 into n04docker1 or 
n04docker2, this will mess up IP-hostname inside VM. Also, if we map 
192.168.3.114 into n04docker1, that means, we can't start the 2nd region server 
over the same physical node because they will be recognized as the physical 
node's IP address/hostname.

> regionserver.HRegionServer: Master passed us a different hostname to use; 
> was=n04docker2, but now=192.168.3.114
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14958
>                 URL: https://issues.apache.org/jira/browse/HBASE-14958
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.2
>         Environment: physical machines: redhat7.1
> docker version: 1.9.1
>            Reporter: Yong Zheng
>
> I have two physical machines: c3m3n03docker and c3m3n04docker.
> I started two docker instances per physical node. the topology is like:
> n03docker1(172.17.1.2)  -\
>                                           | br0(172.17.1.1)  +  c3m3n03
> n03docker2(172.17.1.3) -/
> n04docker1(172.17.2.2)  -\
>                                           | br0(172.17.2.1)  +  c3m3n04
> n04docker2(172.17.2.3) -/
> for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0 
> with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter 
> enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are 
> connecting to the same switch.
> Note: br0 is not bundled to physical adapter enp11s0f0  on both nodes. so, 
> all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and 
> forwarded to c3m3n03.
> n03docker1: hbase(1.1.2) master
> n03docker2: region server
> n04docker1: region server
> n04docker2: region server
> I first start the n03docker1 and n03docker2, it works; after that, I start 
> n04docker2 and it will reported:
> 2015-12-09 08:01:58,259 ERROR 
> [regionserver/n04docker2.gpfs.net/172.17.2.3:16020] 
> regionserver.HRegionServer: Master passed us a different hostname to use; 
> was=n04docker2.gpfs.net, but now=192.168.3.114
> on the master logs:
> 2015-12-09 08:11:12,234 INFO  
> [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: 
> Registering server=192.168.3.114,16020,1449666670721
> So, you see, when hbase master receives the requests from n04docker2, all 
> these requests are source NATed with 192.168.3.114(not 172.17.2.3).  and 
> hbase master passes 192.168.3.114 back to 172.17.2.3(n04docker2). Thus, 
> n04docker1(172.17.2.3) reported exceptions in logs.
> hbase doesn't support running in virtualization cluster? because SNAT is 
> widely used in virtualization. if hbase master get remote hostname/ip(thus 
> get 192.168.3.114) and pass it back to region server, it will hit this issues.
> HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm 
> taking hbase 1.1.2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to