[ 
https://issues.apache.org/jira/browse/HBASE-23889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044704#comment-17044704
 ] 

Bharath Vissapragada commented on HBASE-23889:
----------------------------------------------

I think there are two separate issues here.

bq. after mering back the feature branch, the flaky list for master branch 
became really big.

I think the flakiness started after HBASE-23779 and related patches. That 
increased the fork count and that exposed a lot of flakes. I'm happy to dig 
into failures that you think are caused by the feature branch.

bq. And also, remove the invalidateConnection method in HBTU, just reset the 
property in Configuration to let the client load the new configuration 
automatically.

I do see your point, but I think there are some finer issues here.

1. This pattern of picking random ports after every role restart is very 
specific to the unit-tests to avoid port conflicts. This is not reflective of a 
real world usage pattern. Because every restart picks a totally different 
<port>, the older master:<port> becomes useless. In a normal world where 
restarts happen (and the master:<port> remains the same), the connections are 
resilient. 

2. Getting rid of "invalidateConnection()" is an enormous task (like you 
already noted). Our HBTU usage is pretty erratic and all the tests directly 
operate on the underlying LocalHBaseCluster. If you see my comment in the 
code.. This pretty much means that one needs to rewrite most of the tests 
written over the past 10 (?) years. 

{noformat}
* TODO: There should be a more coherent way of doing this. Unfortunately the 
way tests are
   *   written, not all start() stop() calls go through this class. Most tests 
directly operate on
   *   the underlying mini/local hbase cluster. That makes it difficult for 
this wrapper class to
   *   maintain the connection state automatically. Cleaning this is a much 
bigger refactor.
{noformat}

3. The whole premise of the feature is that the new bootstrap set of nodes for 
clients is the HMaster group. Earlier ZK had this role. This means that client 
expects the masters to be generally available to perform basic operations.  
This resulted in outdated test scenarios (like 
test-connection-when-cluster-is-not-up etc). This is the reason, we have some 
test utilities like {{AlwaysStandByHMaster}} to help transition the tests into 
the newer world.

That said, I still agree with you that having dynamic reconfiguration on the 
client side will nice-to-have feature, but IMHO that alone shouldn't be the 
reason to switch the default registry back to ZK based. WDYT?

> Switch back to ZKConnectionRegistry by default at least in test
> ---------------------------------------------------------------
>
>                 Key: HBASE-23889
>                 URL: https://issues.apache.org/jira/browse/HBASE-23889
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, rpc, test
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>
> For now, MasterRegistry can not deal with master restart, as it can not load 
> the new master address automatically.
> I see there is a invalidateConnection method in HBaseTestingUtilities but it 
> needs a very big refactoring to make all the UTs work like this.
> So here I suggest we switch back to ZKConnectionRegistry by default, and open 
> a new feature branch to finish the TODOs and the refactoring on UTs.
> As now it is already a big problem for me as I want to merge a feature branch 
> back to master but the state of UTs are a mess.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to