[
https://issues.apache.org/jira/browse/HBASE-25903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350019#comment-17350019
]
Anoop Sam John commented on HBASE-25903:
----------------------------------------
Issue observed
*+DNS resolution problem for peer zk results in replication source initialize
to just hang forever+*
In a replication enabled cluster, there is an occassional issue with the DNS
system. When an RS starts, there is an issue with resolving the peer zk
hostname. This is not a permenant issue also.
But when such situation happen, the ReplicationSource initialize is getting
stuck forever and WALs getting accumulated and infinite replication lag. To
come out, only way is manually restart RS.
We are on 2.1.6
HBaseInterClusterReplicationEndpoint create AsyncClusterConnection which in
turn fetches peer clusterID.
{code}
ConnectionRegistry registry = ConnectionRegistryFactory.getRegistry(conf);
String clusterId = FutureUtils.get(registry.getClusterId());
{code}
In zk clients which is not having the fis ZOOKEEPER-2184, will cause
IllegalArgumentException on ZooKeeper instance creation.
In ReadOnlyZKClient#run
{code}
ZooKeeper zk;
try {
zk = getZk();
} catch (IOException e) {
task.connectFailed(e);
continue;
}
task.exec(zk);
{code}
In case of IOE, we have ways to retry for fixed times and finally come out.
> ReadOnlyZKClient APIs - CompletableFuture.get() calls can cause threads to
> hang forver when ZK client create throws Non IOException
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-25903
> URL: https://issues.apache.org/jira/browse/HBASE-25903
> Project: HBase
> Issue Type: Bug
> Reporter: Anoop Sam John
> Assignee: Anoop Sam John
> Priority: Major
>
> This is applicable for zk client versions which is not having fix for
> ZOOKEEPER-2184.
> Now we are on zookeeper 3.5.7 on active 2.x branches. Still its better to
> handle this case in our code.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)