Sandeep Pal created HBASE-25627:
-----------------------------------

             Summary: HBase replication should have a metric to represent if it 
cannot talk to peer's zk
                 Key: HBASE-25627
                 URL: https://issues.apache.org/jira/browse/HBASE-25627
             Project: HBase
          Issue Type: Improvement
            Reporter: Sandeep Pal
            Assignee: Sandeep Pal


There can be situation when the cluster is not able to talk to peer cluster ZK, 
in that case, yes the logQueue will be accumulating but without digging into 
the logs, we cannot know what's the reason of loqQueue getting accumulating on 
the source. 

Since the replication source doesn't even start the shipper in this case, it is 
good to have a dedicated metric if the RS cannot talk to the peer's ZK at all. 

 
{code:java}
2021-03-03 04:02:10,704 DEBUG [peerId] zookeeper.RecoverableZooKeeper - 
Possibly transient ZooKeeper, 
quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181,
 exception=org.apache.zookeeper.KeeperException$AuthFailedException: 
KeeperErrorCode = AuthFailed for /hbase/hbaseid2021-03-03 04:02:10,704 DEBUG 
[peerId] zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper, 
quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181,
 exception=org.apache.zookeeper.KeeperException$AuthFailedException: 
KeeperErrorCode = AuthFailed for 
/hbase/hbaseidorg.apache.zookeeper.KeeperException$AuthFailedException: 
KeeperErrorCode = AuthFailed for /hbase/hbaseid at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:126) at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at 
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1119) at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:284)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:469) at 
org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
 at 
org.apache.hadoop.hbase.zookeeper.ZKClusterId.getUUIDForCluster(ZKClusterId.java:96)
 at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.getPeerUUID(HBaseReplicationEndpoint.java:104)
 at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:306)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to