[
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Stack updated HBASE-26149:
----------------------------------
Description:
(Copied in-line from the attached 'Documentation' with some filler as
connecting script)
HBASE-23324 Deprecate clients that connect to Zookeeper
^^^ This is always our goal, to remove the zookeeper dependency from the client
side.
See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry
When constructing RpcClient, we will pass the clusterid in, and it will be used
to select the authentication method. More specifically, it will be used to
select the tokens for digest based authentication, please see the code in
BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use
RpcClient to connect to zookeeper, so we could get the cluster id first, and
then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we
need to use RpcClient to connect to the ClientMetaService endpoints and then we
can call the getClusterId method to get the cluster id. Because of this, when
creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only pass
null or the default cluster id, which means the digest based authentication is
broken.
This is a cyclic dependency problem. Maybe a possible way forward, is to make
getClusterId method available to all users, which means it does not require any
authentication, so we can always call getClusterId with simple authentication,
and then at client side, once we get the cluster id, we create a new RpcClient
to select the correct authentication way.
The work in the sub-task, HBASE-26150 Let region server also carry
ClientMetaService, is work to make it so the RegionServers can carry a
ConnectionRegistry (rather than have the Masters-only carry it as is the case
now). Adds a new method getBootstrapNodes to ClientMetaService, the
ConnectionRegistry proto Service, for refreshing the bootstrap nodes
periodically or on error. The new *RpcConnectionRegistry* [Created here but
defined in the next sub-task]will use this method to refresh the bootstrap
nodes, while the old MasterRegistry will use the getMasters method to refresh
the ‘bootstrap’ nodes.
The getBootstrapNodes method will return all the region servers, so after the
first refreshing, the client will go to region servers for later rpc calls. But
since masters and region servers both implement the ClientMetaService
interface, it is free for the client to configure master as the initial
bootstrap nodes.
HBASE-26172 Deprecated MasterRegistry and allow getBootstrapNodes to return
master address instead of region server
The implementation of MasterRegistry is almost the same with
RpcConnectionRegistry except that it uses getMasters instead of
getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could
add configs in server side to control what nodes we want to return to client in
getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry
can fully replace the old MasterRegistry. So after this change, we could
deprecate the MasterRegistry.
h1. HBASE-26173 Return only a sub set of region servers as bootstrap nodes
For a large cluster which may have thousands of region servers, it is not a
good idea to return all the region servers as bootstrap nodes to clients. So we
should add a config at server side to control the max number of bootstrap nodes
we want to return to clients. I think the default value could be 5 or 10, which
is enough.
h1. HBASE-26174 Make rpc connection registry the default registry on 3.0.0
Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we should
not make it default for 3.0.0 any more.
h1. HBASE-26180 Introduce a initial refresh interval for RpcConnectionRegistry
As end users could configure any nodes in a cluster as the initial bootstrap
nodes, it is possible that different end users will configure the same machine
which makes the machine over load. So we should have a shorter delay for the
initial refresh, to let users quickly switch to the bootstrap nodes we want
them to connect to.
h1. HBASE-26181 Region server and master could use itself as ConnectionRegistry
This is an optimization to reduce the pressure on zookeeper. For
MasterRegistry, we do not want to use it as the ConnectionRegistry for our
cluster connection because:
// We use ZKConnectionRegistry for all the internal communication,
primarily for these reasons:
// - Decouples RS and master life cycles. RegionServers can continue be up
independent of
// masters' availability.
// - Configuration management for region servers (cluster internal) is much
simpler when adding
// new masters or removing existing masters, since only clients' config
needs to be updated.
// - We need to retain ZKConnectionRegistry for replication use anyway, so
we just extend it for
// other internal connections too.
The above comments are in our code, in the HRegionServer.cleanupConfiguration
method.
But since now, masters and regionservers both implement the ClientMetaService
interface, we are free to just let the ConnectionRegistry to make use of these
in memory information directly, instead of going to zookeeper again.
h1. HBASE-26182 Allow disabling refresh of connection registry endpoint
One possible deployment in production is to use something like a lvs in front
of all the region servers to act as a LB, so clients just need to connect to
the lvs IP instead of going to the region server directly to get registry
information.
For this scenario we do not need to refresh the endpoints any more.
The simplest way is to set the refresh interval to -1.
> Further improvements on ConnectionRegistry implementations
> ----------------------------------------------------------
>
> Key: HBASE-26149
> URL: https://issues.apache.org/jira/browse/HBASE-26149
> Project: HBase
> Issue Type: Umbrella
> Components: Client
> Reporter: Duo Zhang
> Priority: Major
>
> (Copied in-line from the attached 'Documentation' with some filler as
> connecting script)
> HBASE-23324 Deprecate clients that connect to Zookeeper
> ^^^ This is always our goal, to remove the zookeeper dependency from the
> client side.
>
> See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry
> When constructing RpcClient, we will pass the clusterid in, and it will be
> used to select the authentication method. More specifically, it will be used
> to select the tokens for digest based authentication, please see the code in
> BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use
> RpcClient to connect to zookeeper, so we could get the cluster id first, and
> then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we
> need to use RpcClient to connect to the ClientMetaService endpoints and then
> we can call the getClusterId method to get the cluster id. Because of this,
> when creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only
> pass null or the default cluster id, which means the digest based
> authentication is broken.
> This is a cyclic dependency problem. Maybe a possible way forward, is to make
> getClusterId method available to all users, which means it does not require
> any authentication, so we can always call getClusterId with simple
> authentication, and then at client side, once we get the cluster id, we
> create a new RpcClient to select the correct authentication way.
> The work in the sub-task, HBASE-26150 Let region server also carry
> ClientMetaService, is work to make it so the RegionServers can carry a
> ConnectionRegistry (rather than have the Masters-only carry it as is the case
> now). Adds a new method getBootstrapNodes to ClientMetaService, the
> ConnectionRegistry proto Service, for refreshing the bootstrap nodes
> periodically or on error. The new *RpcConnectionRegistry* [Created here but
> defined in the next sub-task]will use this method to refresh the bootstrap
> nodes, while the old MasterRegistry will use the getMasters method to refresh
> the ‘bootstrap’ nodes.
> The getBootstrapNodes method will return all the region servers, so after the
> first refreshing, the client will go to region servers for later rpc calls.
> But since masters and region servers both implement the ClientMetaService
> interface, it is free for the client to configure master as the initial
> bootstrap nodes.
> HBASE-26172 Deprecated MasterRegistry and allow getBootstrapNodes to return
> master address instead of region server
> The implementation of MasterRegistry is almost the same with
> RpcConnectionRegistry except that it uses getMasters instead of
> getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could
> add configs in server side to control what nodes we want to return to client
> in getBootstrapNodes, i.e, master or region server, then the
> RpcConnectionRegistry can fully replace the old MasterRegistry. So after this
> change, we could deprecate the MasterRegistry.
> h1. HBASE-26173 Return only a sub set of region servers as bootstrap nodes
> For a large cluster which may have thousands of region servers, it is not a
> good idea to return all the region servers as bootstrap nodes to clients. So
> we should add a config at server side to control the max number of bootstrap
> nodes we want to return to clients. I think the default value could be 5 or
> 10, which is enough.
> h1. HBASE-26174 Make rpc connection registry the default registry on 3.0.0
> Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we
> should not make it default for 3.0.0 any more.
> h1. HBASE-26180 Introduce a initial refresh interval for RpcConnectionRegistry
> As end users could configure any nodes in a cluster as the initial bootstrap
> nodes, it is possible that different end users will configure the same
> machine which makes the machine over load. So we should have a shorter delay
> for the initial refresh, to let users quickly switch to the bootstrap nodes
> we want them to connect to.
> h1. HBASE-26181 Region server and master could use itself as
> ConnectionRegistry
> This is an optimization to reduce the pressure on zookeeper. For
> MasterRegistry, we do not want to use it as the ConnectionRegistry for our
> cluster connection because:
> // We use ZKConnectionRegistry for all the internal communication,
> primarily for these reasons:
> // - Decouples RS and master life cycles. RegionServers can continue be
> up independent of
> // masters' availability.
> // - Configuration management for region servers (cluster internal) is
> much simpler when adding
> // new masters or removing existing masters, since only clients' config
> needs to be updated.
> // - We need to retain ZKConnectionRegistry for replication use anyway,
> so we just extend it for
> // other internal connections too.
> The above comments are in our code, in the HRegionServer.cleanupConfiguration
> method.
> But since now, masters and regionservers both implement the ClientMetaService
> interface, we are free to just let the ConnectionRegistry to make use of
> these in memory information directly, instead of going to zookeeper again.
> h1. HBASE-26182 Allow disabling refresh of connection registry endpoint
> One possible deployment in production is to use something like a lvs in front
> of all the region servers to act as a LB, so clients just need to connect to
> the lvs IP instead of going to the region server directly to get registry
> information.
> For this scenario we do not need to refresh the endpoints any more.
> The simplest way is to set the refresh interval to -1.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)