[
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265148#comment-17265148
]
Prathyusha commented on HBASE-24972:
------------------------------------
[~stack] Below is the stack trace of a failure incident we have seen -
Cause: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/table/SYSTEM.CATALOG
StackTrace:
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1337)
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:625)
...
StackTraceId: 429763122
But yes, I see the retries in place where ever we are doing write operations.
[~sandeep.guggilam] These retries should suffice I guess. Any thoughts?
> Wait for connection attempt to succeed before performing operations on ZK
> -------------------------------------------------------------------------
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
> Issue Type: Bug
> Reporter: Sandeep Guggilam
> Assignee: Prathyusha
> Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified
> via the passed in watcher about the successful connection event. When we
> attempt any operations, we try to create a connection and then perform a
> read/write
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
> without really waiting for the notification event
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform
> operations on ZK without waiting for the connection attempt to succeed{color}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)