[GitHub] [hbase] Apache9 commented on pull request #2570: HBASE-25126 Add load balance logic in hbase-client to distribute read…

GitBox Tue, 20 Oct 2020 19:51:30 -0700


Apache9 commented on pull request #2570:
URL: https://github.com/apache/hbase/pull/2570#issuecomment-713259579



   On a high level design, we used to have a `hbase.meta.replicas.use` config 
to control whether to make use meta replicas at client side. Do we want to 
deprecate this config and let our new configs rule here? Just asking.
   
   For me, I think the scope of `hbase.meta.replicas.use` is too wide, as it 
will impact all the client operations on meta(not sure if we have hacked all 
the places but at least it is designed to). After reviewing the PR, I think our 
approach here is only for the locator related logic right? This is also what I 
expect. In general, I think there are 3 ways we access meta table at client 
side:
   
   1. Locator related logic. This is the most critical path at client side and 
also makes the most pressure on meta table.
   2. Admin related logic. We have delegated most of the operaations through 
master but there are still some places we will access meta directly. But admin 
operation is expected to be low frequent so I do not think we need to deal with 
it here.
   3. Users access meta table directly by their own. This is controlled by user 
written code so I do not think we need to deal with it either, users should 
take care of it by their own.
   
   I think only 1 is what we really care here, so I suggest that, we just 
narrow the scope of the newly introduced configs to be locator only(maybe by 
adding 'locator' keyword in the config name), and consider it first before 
`hbase.meta.replicas.use` in the locator related logic. So users do not need to 
set `hbase.meta.replicas.use` to true to enable this feature, just select the 
LB mode. If the new config is disabled(set to None maybe), then we honor the 
old `hbase.meta.replicas.use` config.
   
   In general, I think the abstraction and trick here are both good. Setting 
the replica id directly on Query is a straight forward way to archive our goal 
here, and the chooser or selector is also a good abstraction. The only concern 
is how to implement the 'fallback to primary' logic as we need to pass down 
from the rpc retrying caller of the actual exception type, anyway, this can be 
done later.
   
   And I suggest we just make this PR against master branch, it is only client 
side change and whether we have implement the meta region replication should 
not impact the code. Why I suggest this is that, the code for master and 
branch-2 will be different, as on branch-2, the sync client has its own logic 
to do region locating, which is not well constructed I believe(we expose a 
bunch of locating methods in ClusterConnection interface and use it 
everywhere). So if we want to include this feature in 2.4.0, we'd better make 
this PR against master, and also backport it to branch-2 ASAP.
   
   Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] Apache9 commented on pull request #2570: HBASE-25126 Add load balance logic in hbase-client to distribute read…

Reply via email to