[ 
https://issues.apache.org/jira/browse/HBASE-12954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302667#comment-14302667
 ] 

Clay B. commented on HBASE-12954:
---------------------------------

[~apurtell] and [~stack], I think there's some past design I'm not catching as 
to a desire to split internal and external HBase identification.  Is the desire 
for a "split-view" (internal/external) of a cluster, for a network having a 
completely isolated cluster (e.g. public access is not accessible from internal 
networks but somehow external requests can still permeate in and be answered); 
I can't quite envision such a network. Certainly, I've seen interesting issues 
running multiple region servers on the same machine but the port part of the 
{{RegionServerStatus}}(?) must be the key disambiguator there; in the CDH3 days 
(~0.90.4) one could certainly end up with duplicate registration due to 
inconsistent DNS/hostfile entries and that was bad; this would provide a 
canonical hostname from the region server should one chose to take matters into 
their own hands.

I would be more concerned from an operations perspective that a mapping file 
(internal and external /etc/hosts for HBase) or script needs to be identical 
across a cluster (e.g. would need to be updated atomically) versus a 
configuration in a region server's hbase-site.xml which would be a single 
source of truth for that region server only and if incorrect would only affect 
that region server (ideally if we can figure out a good way to prevent 
potential duplicate registration otherwise duplicate registration could be a 
problem). It seems that a script would end up needing to query some atomic 
single source of truth like Zookeeper, Consul or etcd in the end anyways (as a 
master may jump e.g. due to a hardware failure at any time and one may want to 
move a region server(s)'s hostname) versus distributing responsibilty to the 
region servers and having a good check for duplicate registration. (Perhaps 
this could implement some UUID generation scheme as was suggested in the 
similar HBASE-3413, if protecting users from themselves is a key concern; also 
since work like HBASE-5844 has come about since the duplicate registration 
days, is this as big of a problem -- would we delete the znode of a competing 
region server?)

Again, following the line of a mapping script/file, would the thought be that 
the master enforce some reregistration process so that if hosta 1.1.1.1 becomes 
hostb which was 1.1.1.2 we don't allow all sorts of havoc to come about because 
region server identiies were changed in the mapping file but no region server 
restart was performed? (Further, how would this be handled gracefully and the 
administrator handle coordination across a cluster?)

> Ability impaired using HBase on multihomed hosts
> ------------------------------------------------
>
>                 Key: HBASE-12954
>                 URL: https://issues.apache.org/jira/browse/HBASE-12954
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.4
>            Reporter: Clay B.
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 12954-v1.txt, Hadoop Three Interfaces.png
>
>
> For HBase clusters running on unusual networks (such as NAT'd cloud 
> environments or physical machines with multiple IP's per network interface) 
> it would be ideal to have a way to both specify:
> # which IP interface to which HBase master or region-server will bind
> # what hostname HBase will advertise in Zookeeper both for a master or 
> region-server process
> While efforts such as HBASE-8640 go a long way to normalize these two sources 
> of information, it is not possible in the current design of the properties 
> available to an administrator for these to be unambiguously specified.
> One has been able to request {{hbase.master.ipc.address}} or 
> {{hbase.regionserver.ipc.address}} but one can not specify the desired HBase 
> {{hbase.master.hostname}}. (It was removed in HBASE-1357, further I am 
> unaware of a region-server equivalent.)
> I use a configuration management system to generate all of my configuration 
> files on a per-machine basis. As such, an option to generate a file 
> specifying exactly which hostname to use would be helpful.
> Today, specifying the bind address for HBase works and one can use an 
> HBase-only DNS for faking what to put in Zookeeper but this is far from 
> ideal. Network interfaces have no intrinsic IP address, nor hostname. 
> Specifing a DNS server is awkward as the DNS server may differ from the 
> system's resolver and is a single IP address. Similarly, on hosts which use a 
> transient VIP (e.g. through keepalived) for other services, it means there's 
> a seemingly non-deterministic hostname choice made by HBase depending on the 
> state of the VIP at daemon start-up time.
> I will attach two networking examples I use which become very difficult to 
> manage under the current properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to