[
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396887#comment-17396887
]
Michael Stack commented on HBASE-26149:
---------------------------------------
The one-pager helped. Thanks. I put it here as the Jira description copying to
sub-tasks description that was in this document bit missing from the sub-task
JIRA's desciption. Hopefully makes it easier on others trying to follow-long
whats going on here (Put some questions on the document for my own
clarification). THanks.
> Further improvements on ConnectionRegistry implementations
> ----------------------------------------------------------
>
> Key: HBASE-26149
> URL: https://issues.apache.org/jira/browse/HBASE-26149
> Project: HBase
> Issue Type: Umbrella
> Components: Client
> Reporter: Duo Zhang
> Priority: Major
>
> (Copied in-line from the attached 'Documentation' with some filler as
> connecting script)
> HBASE-23324 Deprecate clients that connect to Zookeeper
> ^^^ This is always our goal, to remove the zookeeper dependency from the
> client side.
>
> See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry
> When constructing RpcClient, we will pass the clusterid in, and it will be
> used to select the authentication method. More specifically, it will be used
> to select the tokens for digest based authentication, please see the code in
> BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use
> RpcClient to connect to zookeeper, so we could get the cluster id first, and
> then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we
> need to use RpcClient to connect to the ClientMetaService endpoints and then
> we can call the getClusterId method to get the cluster id. Because of this,
> when creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only
> pass null or the default cluster id, which means the digest based
> authentication is broken.
> This is a cyclic dependency problem. Maybe a possible way forward, is to make
> getClusterId method available to all users, which means it does not require
> any authentication, so we can always call getClusterId with simple
> authentication, and then at client side, once we get the cluster id, we
> create a new RpcClient to select the correct authentication way.
> The work in the sub-task, HBASE-26150 Let region server also carry
> ClientMetaService, is work to make it so the RegionServers can carry a
> ConnectionRegistry (rather than have the Masters-only carry it as is the case
> now). Adds a new method getBootstrapNodes to ClientMetaService, the
> ConnectionRegistry proto Service, for refreshing the bootstrap nodes
> periodically or on error. The new *RpcConnectionRegistry* [Created here but
> defined in the next sub-task]will use this method to refresh the bootstrap
> nodes, while the old MasterRegistry will use the getMasters method to refresh
> the ‘bootstrap’ nodes.
> The getBootstrapNodes method will return all the region servers, so after the
> first refreshing, the client will go to region servers for later rpc calls.
> But since masters and region servers both implement the ClientMetaService
> interface, it is free for the client to configure master as the initial
> bootstrap nodes.
> The following sub-task then deprecates MasterRegistry, HBASE-26172 Deprecated
> MasterRegistry
> The implementation of MasterRegistry is almost the same with
> RpcConnectionRegistry except that it uses getMasters instead of
> getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could
> add configs in server side to control what nodes we want to return to client
> in getBootstrapNodes, i.e, master or region server, then the
> RpcConnectionRegistry can fully replace the old MasterRegistry. Deprecates
> the MasterRegistry.
> Sub-task HBASE-26173 Return only a sub set of region servers as bootstrap
> nodes
> For a large cluster which may have thousands of region servers, it is not a
> good idea to return all the region servers as bootstrap nodes to clients. So
> we should add a config at server side to control the max number of bootstrap
> nodes we want to return to clients. I think the default value could be 5 or
> 10, which is enough.
> Sub-task HBASE-26174 Make rpc connection registry the default registry on
> 3.0.0
> Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we
> should not make it default for 3.0.0 any more.
> Sub-task HBASE-26180 Introduce a initial refresh interval for
> RpcConnectionRegistry
> As end users could configure any nodes in a cluster as the initial bootstrap
> nodes, it is possible that different end users will configure the same
> machine which makes the machine over load. So we should have a shorter delay
> for the initial refresh, to let users quickly switch to the bootstrap nodes
> we want them to connect to.
> Sub-task HBASE-26181 Region server and master could use itself as
> ConnectionRegistry
> This is an optimization to reduce the pressure on zookeeper. For
> MasterRegistry, we do not want to use it as the ConnectionRegistry for our
> cluster connection because:
> // We use ZKConnectionRegistry for all the internal communication,
> primarily for these reasons:
> // - Decouples RS and master life cycles. RegionServers can continue be
> up independent of
> // masters' availability.
> // - Configuration management for region servers (cluster internal) is
> much simpler when adding
> // new masters or removing existing masters, since only clients' config
> needs to be updated.
> // - We need to retain ZKConnectionRegistry for replication use anyway,
> so we just extend it for
> // other internal connections too.
> The above comments are in our code, in the HRegionServer.cleanupConfiguration
> method.
> But since now, masters and regionservers both implement the ClientMetaService
> interface, we are free to just let the ConnectionRegistry to make use of
> these in memory information directly, instead of going to zookeeper again.
> Sub-task HBASE-26182 Allow disabling refresh of connection registry endpoint
> One possible deployment in production is to use something like a lvs in front
> of all the region servers to act as a LB, so clients just need to connect to
> the lvs IP instead of going to the region server directly to get registry
> information.
> For this scenario we do not need to refresh the endpoints any more.
> The simplest way is to set the refresh interval to -1.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)