[ 
https://issues.apache.org/jira/browse/HBASE-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049930#comment-15049930
 ] 

Yong Zheng commented on HBASE-14958:
------------------------------------

I did some simple test on n03docker2(172.17.1.3) with tcp_server; on 
n04docker2(172.17.2.3) with tcp_client.

in tcp_server:
...
                sin_size=sizeof(struct   sockaddr_in);    
                if((new_fd=accept(sockfd,(struct   sockaddr   
*)(&client_addr),&sin_size)) == -1)    
                {    
                        fprintf(stderr,"Accept   
error:%s\n\a",strerror(errno));    
                        exit(1);    
                }
                fprintf(stderr,"Server   get   connection   from   %x\n",    
                client_addr.sin_addr.s_addr);    

                ret = getpeername(sockfd, (struct   sockaddr   
*)(&client_peer_addr), &sin_size); 
...

on tcp_client, it just connects to the server and send one message.

bash-4.1# hostname
n04docker2
bash-4.1# ./tcp_client 172.17.1.3 8030

on tcp_server,
bash-4.1# hostname
n03docker2.gpfs.net
bash-4.1# ./tcp_server 8030
will accepting...
Server   get   connection ...
Server   get   connection   from   7203a8c0 <== this IP address is 
192.168.3.114 after transforming to host address.

So, in Source NAT-involved virtualization, it looks to me that the current 
hbase master/region server mechanism doesn't work. maybe, we could ask the 
region server/master to exchange the hostname,not depends on socket API to get 
the client IP address.


> regionserver.HRegionServer: Master passed us a different hostname to use; 
> was=n04docker2, but now=192.168.3.114
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14958
>                 URL: https://issues.apache.org/jira/browse/HBASE-14958
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.2
>         Environment: physical machines: redhat7.1
> docker version: 1.9.1
>            Reporter: Yong Zheng
>
> I have two physical machines: c3m3n03docker and c3m3n04docker.
> I started two docker instances per physical node. the topology is like:
> n03docker1(172.17.1.2)  -\
>                                           | br0(172.17.1.1)  +  c3m3n03
> n03docker2(172.17.1.3) -/
> n04docker1(172.17.2.2)  -\
>                                           | br0(172.17.2.1)  +  c3m3n04
> n04docker2(172.17.2.3) -/
> for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0 
> with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter 
> enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are 
> connecting to the same switch.
> Note: br0 is not bundled to physical adapter enp11s0f0  on both nodes. so, 
> all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and 
> forwarded to c3m3n03.
> n03docker1: hbase(1.1.2) master
> n03docker2: region server
> n04docker1: region server
> n04docker2: region server
> I first start the n03docker1 and n03docker2, it works; after that, I start 
> n04docker2 and it will reported:
> 2015-12-09 08:01:58,259 ERROR 
> [regionserver/n04docker2.gpfs.net/172.17.2.3:16020] 
> regionserver.HRegionServer: Master passed us a different hostname to use; 
> was=n04docker2.gpfs.net, but now=192.168.3.114
> on the master logs:
> 2015-12-09 08:11:12,234 INFO  
> [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: 
> Registering server=192.168.3.114,16020,1449666670721
> So, you see, when hbase master receives the requests from n04docker2, all 
> these requests are source NATed with 192.168.3.114(not 172.17.2.3).  and 
> hbase master passes 192.168.3.114 back to 172.17.2.3(n04docker2). Thus, 
> n04docker1(172.17.2.3) reported exceptions in logs.
> hbase doesn't support running in virtualization cluster? because SNAT is 
> widely used in virtualization. if hbase master get remote hostname/ip(thus 
> get 192.168.3.114) and pass it back to region server, it will hit this issues.
> HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm 
> taking hbase 1.1.2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to