[ 
https://issues.apache.org/jira/browse/HBASE-18095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024640#comment-17024640
 ] 

Bharath Vissapragada commented on HBASE-18095:
----------------------------------------------

For those following this work, I'd like to update the current status. Most 
critical patches are already in (see the subtasks). There are just two pending 
PRs [1, 2] that are already approved and should be committed soon. The nightly 
test runs on the feature branch [3] are relatively stable (except the flaky 
test of TestFromClientSide which is fixed by [2] and the remaining tests 
failures are not specific to this branch). Once the remaining patches are 
committed (that switches the default registry), I can keep an eye on the 
nightly job for a few days to make sure nothing else is broken. At this point 
there are no bugs/feature gaps/performance concerns that I'm aware of (please 
correct me if I'm wrong).

Now that we are pretty close to getting all the needed patches in, I'd like to 
kickstart the discussion of merging the feature branch into master and see what 
other folks think. It is a pain to maintain both the branches since we need 
frequent rebases. Given the current status of the work and the nightly test 
runs and the test coverage, I'm fairly confident that this work can land in the 
master branch. There might be some very little initial friction (for developers 
writing unit tests) since the code changes are invasive and touch many commonly 
used codepaths (like connection setup, RPC framework etc) and also rewrites a 
bunch of tests and common test utilities. But I think those issues can be 
treated on a case-by-case basis. Worst case, we have a kill-switch for the 
entire feature (one line change), if we really need it.

If everyone agrees to merge the feature branch, the follow up work would be to
 - Document the feature (client and developer facing) and expectations - 
HBASE-23331
 - Work on the back ports for branch-2 and branch-1 (targeting the upcoming 
2.3.0)
 - Doing some perf runs to see how it compares with the baseline (ZK based 
registry) and see how the nightly runs behave
 - Fixing test coverage gaps if any.

[~apurtell] [~ndimiduk] [~stack] thoughts?

— References
 [1] [https://github.com/apache/hbase/pull/1039]
 [2] [https://github.com/apache/hbase/pull/1091]
 [3] 
[https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18095%252Fclient-locate-meta-no-zookeeper/]

> Provide an option for clients to find the server hosting META that does not 
> involve the ZooKeeper client
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18095
>                 URL: https://issues.apache.org/jira/browse/HBASE-18095
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client
>            Reporter: Andrew Kyle Purtell
>            Assignee: Bharath Vissapragada
>            Priority: Major
>         Attachments: HBASE-18095.master-v1.patch, HBASE-18095.master-v2.patch
>
>
> Clients are required to connect to ZooKeeper to find the location of the 
> regionserver hosting the meta table region. Site configuration provides the 
> client a list of ZK quorum peers and the client uses an embedded ZK client to 
> query meta location. Timeouts and retry behavior of this embedded ZK client 
> are managed orthogonally to HBase layer settings and in some cases the ZK 
> cannot manage what in theory the HBase client can, i.e. fail fast upon outage 
> or network partition.
> We should consider new configuration settings that provide a list of 
> well-known master and backup master locations, and with this information the 
> client can contact any of the master processes directly. Any master in either 
> active or passive state will track meta location and respond to requests for 
> it with its cached last known location. If this location is stale, the client 
> can ask again with a flag set that requests the master refresh its location 
> cache and return the up-to-date location. Every client interaction with the 
> cluster thus uses only HBase RPC as transport, with appropriate settings 
> applied to the connection. The configuration toggle that enables this 
> alternative meta location lookup should be false by default.
> This removes the requirement that HBase clients embed the ZK client and 
> contact the ZK service directly at the beginning of the connection lifecycle. 
> This has several benefits. ZK service need not be exposed to clients, and 
> their potential abuse, yet no benefit ZK provides the HBase server cluster is 
> compromised. Normalizing HBase client and ZK client timeout settings and 
> retry behavior - in some cases, impossible, i.e. for fail-fast - is no longer 
> necessary. 
> And, from [~ghelmling]: There is an additional complication here for 
> token-based authentication. When a delegation token is used for SASL 
> authentication, the client uses the cluster ID obtained from Zookeeper to 
> select the token identifier to use. So there would also need to be some 
> Zookeeper-less, unauthenticated way to obtain the cluster ID as well. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to