Sandeep Pal created HBASE-24716:
-----------------------------------
Summary: Do the error handling for replication admin failures
Key: HBASE-24716
URL: https://issues.apache.org/jira/browse/HBASE-24716
Project: HBase
Issue Type: Improvement
Components: Replication
Reporter: Sandeep Pal
Assignee: Sandeep Pal
[listPeerConfigs()|[https://git.soma.salesforce.com/bigdata-packaging/hbase/blob/1.6.0-sfdc-1/hbase-client/src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java#L295]]
for getting the list of peers along with their configuration is not a reliable
API.
It is not very robust to errors, logs FATAL and swallows the
[exceptions|[https://github.com/apache/hbase/blob/branch-1/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java#L254]]
Snippet:
catch (KeeperException e) {
this.abortable.abort("Cannot get the list of peers ", e);
} catch (ReplicationException e) {
this.abortable.abort("Cannot get the list of peers ", e);
}
return peers;
The abortable (connection in this case) also doesn't abort the region server
and just logs. This makes upstream believe that there is nothing wrong and
proceed without any action which is not good.
{code:java}
2020-07-07 23:11:37,857 FATAL [14774961,peer_id]
client.ConnectionManager$HConnectionImplementation - Cannot get the list of
peersorg.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/replication/peersat
org.apache.zookeeper.KeeperException.create(KeeperException.java:130)at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)at
org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1549)at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:312)at
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:513)at
org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.getAllPeerConfigs(ReplicationPeersZKImpl.java:249)at
org.apache.hadoop.hbase.client.replication.ReplicationAdmin.listPeerConfigs(ReplicationAdmin.java:332)
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)