zookeeper client wont reconnect if there is a problem
-----------------------------------------------------
Key: HBASE-1232
URL: https://issues.apache.org/jira/browse/HBASE-1232
Project: Hadoop HBase
Issue Type: Bug
Environment: java 1.7, zookeeper 3.0.1
Reporter: ryan rawson
Assignee: Jean-Daniel Cryans
Fix For: 0.20.0
my regionserver got wedged:
2009-03-02 15:43:30,938 WARN
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:87)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:35)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:482)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:219)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:240)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.checkOutOfSafeMode(ZooKeeperWrapper.java:328)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:783)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:468)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:443)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:518)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:477)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:450)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:295)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:919)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:950)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1370)
at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1314)
at org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1294)
at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:237)
at org.apache.hadoop.hbase.RegionHistorian.add(RegionHistorian.java:216)
at
org.apache.hadoop.hbase.RegionHistorian.addRegionSplit(RegionHistorian.java:174)
at
org.apache.hadoop.hbase.regionserver.HRegion.splitRegion(HRegion.java:607)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:174)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:107)
this message repeats over and over.
Looking at the code in question:
private boolean ensureExists(final String znode) {
try {
zooKeeper.create(znode, new byte[0],
Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
LOG.debug("Created ZNode " + znode);
return true;
} catch (KeeperException.NodeExistsException e) {
return true; // ok, move on.
} catch (KeeperException.NoNodeException e) {
return ensureParentExists(znode) && ensureExists(znode);
} catch (KeeperException e) {
LOG.warn("Failed to create " + znode + ":", e);
} catch (InterruptedException e) {
LOG.warn("Failed to create " + znode + ":", e);
}
return false;
}
We need to catch this exception specifically and reopen the ZK connection.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.