[
https://issues.apache.org/jira/browse/HBASE-18095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024640#comment-17024640
]
Bharath Vissapragada commented on HBASE-18095:
----------------------------------------------
For those following this work, I'd like to update the current status. Most
critical patches are already in (see the subtasks). There are just two pending
PRs [1, 2] that are already approved and should be committed soon. The nightly
test runs on the feature branch [3] are relatively stable (except the flaky
test of TestFromClientSide which is fixed by [2] and the remaining tests
failures are not specific to this branch). Once the remaining patches are
committed (that switches the default registry), I can keep an eye on the
nightly job for a few days to make sure nothing else is broken. At this point
there are no bugs/feature gaps/performance concerns that I'm aware of (please
correct me if I'm wrong).
Now that we are pretty close to getting all the needed patches in, I'd like to
kickstart the discussion of merging the feature branch into master and see what
other folks think. It is a pain to maintain both the branches since we need
frequent rebases. Given the current status of the work and the nightly test
runs and the test coverage, I'm fairly confident that this work can land in the
master branch. There might be some very little initial friction (for developers
writing unit tests) since the code changes are invasive and touch many commonly
used codepaths (like connection setup, RPC framework etc) and also rewrites a
bunch of tests and common test utilities. But I think those issues can be
treated on a case-by-case basis. Worst case, we have a kill-switch for the
entire feature (one line change), if we really need it.
If everyone agrees to merge the feature branch, the follow up work would be to
- Document the feature (client and developer facing) and expectations -
HBASE-23331
- Work on the back ports for branch-2 and branch-1 (targeting the upcoming
2.3.0)
- Doing some perf runs to see how it compares with the baseline (ZK based
registry) and see how the nightly runs behave
- Fixing test coverage gaps if any.
[~apurtell] [~ndimiduk] [~stack] thoughts?
— References
[1] [https://github.com/apache/hbase/pull/1039]
[2] [https://github.com/apache/hbase/pull/1091]
[3]
[https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18095%252Fclient-locate-meta-no-zookeeper/]
> Provide an option for clients to find the server hosting META that does not
> involve the ZooKeeper client
> --------------------------------------------------------------------------------------------------------
>
> Key: HBASE-18095
> URL: https://issues.apache.org/jira/browse/HBASE-18095
> Project: HBase
> Issue Type: New Feature
> Components: Client
> Reporter: Andrew Kyle Purtell
> Assignee: Bharath Vissapragada
> Priority: Major
> Attachments: HBASE-18095.master-v1.patch, HBASE-18095.master-v2.patch
>
>
> Clients are required to connect to ZooKeeper to find the location of the
> regionserver hosting the meta table region. Site configuration provides the
> client a list of ZK quorum peers and the client uses an embedded ZK client to
> query meta location. Timeouts and retry behavior of this embedded ZK client
> are managed orthogonally to HBase layer settings and in some cases the ZK
> cannot manage what in theory the HBase client can, i.e. fail fast upon outage
> or network partition.
> We should consider new configuration settings that provide a list of
> well-known master and backup master locations, and with this information the
> client can contact any of the master processes directly. Any master in either
> active or passive state will track meta location and respond to requests for
> it with its cached last known location. If this location is stale, the client
> can ask again with a flag set that requests the master refresh its location
> cache and return the up-to-date location. Every client interaction with the
> cluster thus uses only HBase RPC as transport, with appropriate settings
> applied to the connection. The configuration toggle that enables this
> alternative meta location lookup should be false by default.
> This removes the requirement that HBase clients embed the ZK client and
> contact the ZK service directly at the beginning of the connection lifecycle.
> This has several benefits. ZK service need not be exposed to clients, and
> their potential abuse, yet no benefit ZK provides the HBase server cluster is
> compromised. Normalizing HBase client and ZK client timeout settings and
> retry behavior - in some cases, impossible, i.e. for fail-fast - is no longer
> necessary.
> And, from [~ghelmling]: There is an additional complication here for
> token-based authentication. When a delegation token is used for SASL
> authentication, the client uses the cluster ID obtained from Zookeeper to
> select the token identifier to use. So there would also need to be some
> Zookeeper-less, unauthenticated way to obtain the cluster ID as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)