[jira] [Commented] (HBASE-28339) HBaseReplicationEndpoint creates new ZooKeeper client every time it tries to reconnect

Duo Zhang (Jira) Thu, 01 Feb 2024 17:15:25 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813446#comment-17813446
 ]


Duo Zhang commented on HBASE-28339:
-----------------------------------

We do not have backoff when retrying here?

> HBaseReplicationEndpoint creates new ZooKeeper client every time it tries to 
> reconnect
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-28339
>                 URL: https://issues.apache.org/jira/browse/HBASE-28339
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.6.0, 2.4.17, 3.0.0-beta-1, 2.5.7, 2.7.0
>            Reporter: Andor Molnar
>            Assignee: Andor Molnar
>            Priority: Major
>
> Asbtract base class {{HBaseReplicationEndpoint}} and therefore 
> {{HBaseInterClusterReplicationEndpoint}} creates new ZooKeeper client 
> instance every time there's an error occurs in communication and it tries to 
> reconnect. This was not a problem with ZooKeeper 3.4.x versions, because the 
> TGT Login thread was a static reference and only created once for all clients 
> in the same JVM. With the upgrade to ZooKeeper 3.5.x the login thread is 
> dedicated to the client instance, hence we have a new login thread every time 
> the replication endpoint reconnects.
> {code:java}
> /**
>  * A private method used to re-establish a zookeeper session with a peer 
> cluster.
>  */
> protected void reconnect(KeeperException ke) {
>   if (
>     ke instanceof ConnectionLossException || ke instanceof 
> SessionExpiredException
>       || ke instanceof AuthFailedException
>   ) {
>     String clusterKey = ctx.getPeerConfig().getClusterKey();
>     LOG.warn("Lost the ZooKeeper connection for peer " + clusterKey, ke);
>     try {
>       reloadZkWatcher();
>     } catch (IOException io) {
>       LOG.warn("Creation of ZookeeperWatcher failed for peer " + clusterKey, 
> io);
>     }
>   }
> }{code}
> {code:java}
> /**
>  * Closes the current ZKW (if not null) and creates a new one
>  * @throws IOException If anything goes wrong connecting
>  */
> synchronized void reloadZkWatcher() throws IOException {
>   if (zkw != null) zkw.close();
>   zkw = new ZKWatcher(ctx.getConfiguration(), "connection to cluster: " + 
> ctx.getPeerId(), this);
>   getZkw().registerListener(new PeerRegionServerListener(this));
> } {code}
> If the target cluster of replication is unavailable for some reason, the 
> replication endpoint keeps trying to reconnect to ZooKeeper destroying and 
> creating new Login threads constantly which will carpet bomb the KDC host 
> with login requests.
>  
> I'm not sure how to fix this yet, trying to create a unit test first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28339) HBaseReplicationEndpoint creates new ZooKeeper client every time it tries to reconnect

Reply via email to