[
https://issues.apache.org/jira/browse/HBASE-23889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044704#comment-17044704
]
Bharath Vissapragada commented on HBASE-23889:
----------------------------------------------
I think there are two separate issues here.
bq. after mering back the feature branch, the flaky list for master branch
became really big.
I think the flakiness started after HBASE-23779 and related patches. That
increased the fork count and that exposed a lot of flakes. I'm happy to dig
into failures that you think are caused by the feature branch.
bq. And also, remove the invalidateConnection method in HBTU, just reset the
property in Configuration to let the client load the new configuration
automatically.
I do see your point, but I think there are some finer issues here.
1. This pattern of picking random ports after every role restart is very
specific to the unit-tests to avoid port conflicts. This is not reflective of a
real world usage pattern. Because every restart picks a totally different
<port>, the older master:<port> becomes useless. In a normal world where
restarts happen (and the master:<port> remains the same), the connections are
resilient.
2. Getting rid of "invalidateConnection()" is an enormous task (like you
already noted). Our HBTU usage is pretty erratic and all the tests directly
operate on the underlying LocalHBaseCluster. If you see my comment in the
code.. This pretty much means that one needs to rewrite most of the tests
written over the past 10 (?) years.
{noformat}
* TODO: There should be a more coherent way of doing this. Unfortunately the
way tests are
* written, not all start() stop() calls go through this class. Most tests
directly operate on
* the underlying mini/local hbase cluster. That makes it difficult for
this wrapper class to
* maintain the connection state automatically. Cleaning this is a much
bigger refactor.
{noformat}
3. The whole premise of the feature is that the new bootstrap set of nodes for
clients is the HMaster group. Earlier ZK had this role. This means that client
expects the masters to be generally available to perform basic operations.
This resulted in outdated test scenarios (like
test-connection-when-cluster-is-not-up etc). This is the reason, we have some
test utilities like {{AlwaysStandByHMaster}} to help transition the tests into
the newer world.
That said, I still agree with you that having dynamic reconfiguration on the
client side will nice-to-have feature, but IMHO that alone shouldn't be the
reason to switch the default registry back to ZK based. WDYT?
> Switch back to ZKConnectionRegistry by default at least in test
> ---------------------------------------------------------------
>
> Key: HBASE-23889
> URL: https://issues.apache.org/jira/browse/HBASE-23889
> Project: HBase
> Issue Type: Bug
> Components: Client, rpc, test
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
>
> For now, MasterRegistry can not deal with master restart, as it can not load
> the new master address automatically.
> I see there is a invalidateConnection method in HBaseTestingUtilities but it
> needs a very big refactoring to make all the UTs work like this.
> So here I suggest we switch back to ZKConnectionRegistry by default, and open
> a new feature branch to finish the TODOs and the refactoring on UTs.
> As now it is already a big problem for me as I want to merge a feature branch
> back to master but the state of UTs are a mess.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)