[
https://issues.apache.org/jira/browse/HBASE-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049930#comment-15049930
]
Yong Zheng commented on HBASE-14958:
------------------------------------
I did some simple test on n03docker2(172.17.1.3) with tcp_server; on
n04docker2(172.17.2.3) with tcp_client.
in tcp_server:
...
sin_size=sizeof(struct sockaddr_in);
if((new_fd=accept(sockfd,(struct sockaddr
*)(&client_addr),&sin_size)) == -1)
{
fprintf(stderr,"Accept
error:%s\n\a",strerror(errno));
exit(1);
}
fprintf(stderr,"Server get connection from %x\n",
client_addr.sin_addr.s_addr);
ret = getpeername(sockfd, (struct sockaddr
*)(&client_peer_addr), &sin_size);
...
on tcp_client, it just connects to the server and send one message.
bash-4.1# hostname
n04docker2
bash-4.1# ./tcp_client 172.17.1.3 8030
on tcp_server,
bash-4.1# hostname
n03docker2.gpfs.net
bash-4.1# ./tcp_server 8030
will accepting...
Server get connection ...
Server get connection from 7203a8c0 <== this IP address is
192.168.3.114 after transforming to host address.
So, in Source NAT-involved virtualization, it looks to me that the current
hbase master/region server mechanism doesn't work. maybe, we could ask the
region server/master to exchange the hostname,not depends on socket API to get
the client IP address.
> regionserver.HRegionServer: Master passed us a different hostname to use;
> was=n04docker2, but now=192.168.3.114
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-14958
> URL: https://issues.apache.org/jira/browse/HBASE-14958
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.1.2
> Environment: physical machines: redhat7.1
> docker version: 1.9.1
> Reporter: Yong Zheng
>
> I have two physical machines: c3m3n03docker and c3m3n04docker.
> I started two docker instances per physical node. the topology is like:
> n03docker1(172.17.1.2) -\
> | br0(172.17.1.1) + c3m3n03
> n03docker2(172.17.1.3) -/
> n04docker1(172.17.2.2) -\
> | br0(172.17.2.1) + c3m3n04
> n04docker2(172.17.2.3) -/
> for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0
> with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter
> enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are
> connecting to the same switch.
> Note: br0 is not bundled to physical adapter enp11s0f0 on both nodes. so,
> all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and
> forwarded to c3m3n03.
> n03docker1: hbase(1.1.2) master
> n03docker2: region server
> n04docker1: region server
> n04docker2: region server
> I first start the n03docker1 and n03docker2, it works; after that, I start
> n04docker2 and it will reported:
> 2015-12-09 08:01:58,259 ERROR
> [regionserver/n04docker2.gpfs.net/172.17.2.3:16020]
> regionserver.HRegionServer: Master passed us a different hostname to use;
> was=n04docker2.gpfs.net, but now=192.168.3.114
> on the master logs:
> 2015-12-09 08:11:12,234 INFO
> [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager:
> Registering server=192.168.3.114,16020,1449666670721
> So, you see, when hbase master receives the requests from n04docker2, all
> these requests are source NATed with 192.168.3.114(not 172.17.2.3). and
> hbase master passes 192.168.3.114 back to 172.17.2.3(n04docker2). Thus,
> n04docker1(172.17.2.3) reported exceptions in logs.
> hbase doesn't support running in virtualization cluster? because SNAT is
> widely used in virtualization. if hbase master get remote hostname/ip(thus
> get 192.168.3.114) and pass it back to region server, it will hit this issues.
> HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm
> taking hbase 1.1.2).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)