[ https://issues.apache.org/jira/browse/HBASE-25627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bharath Vissapragada resolved HBASE-25627. ------------------------------------------ Resolution: Fixed Thanks [~sandeep.pal] > HBase replication should have a metric to represent if the source is stuck > getting initialized > ---------------------------------------------------------------------------------------------- > > Key: HBASE-25627 > URL: https://issues.apache.org/jira/browse/HBASE-25627 > Project: HBase > Issue Type: Improvement > Components: Replication > Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3 > Reporter: Sandeep Pal > Assignee: Sandeep Pal > Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3 > > > There can be situation when the cluster is not able to talk to peer cluster > ZK, in that case, yes the logQueue will be accumulating but without digging > into the logs, we cannot know what's the reason of loqQueue getting > accumulating on the source. > Since the replication source doesn't even start the shipper in this case, it > is good to have a dedicated metric if the RS cannot talk to the peer's ZK at > all. > > {code:java} > 2021-03-03 04:02:10,704 DEBUG [peerId] zookeeper.RecoverableZooKeeper - > Possibly transient ZooKeeper, > quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181, > exception=org.apache.zookeeper.KeeperException$AuthFailedException: > KeeperErrorCode = AuthFailed for /hbase/hbaseid2021-03-03 04:02:10,704 DEBUG > [peerId] zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper, > quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181, > exception=org.apache.zookeeper.KeeperException$AuthFailedException: > KeeperErrorCode = AuthFailed for > /hbase/hbaseidorg.apache.zookeeper.KeeperException$AuthFailedException: > KeeperErrorCode = AuthFailed for /hbase/hbaseid at > org.apache.zookeeper.KeeperException.create(KeeperException.java:126) at > org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1119) at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:284) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:469) at > org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) > at > org.apache.hadoop.hbase.zookeeper.ZKClusterId.getUUIDForCluster(ZKClusterId.java:96) > at > org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.getPeerUUID(HBaseReplicationEndpoint.java:104) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:306) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)