Looks like the fix from HBASE-7779 wasn't included. See: https://issues.apache.org/jira/secure/attachment/12568663/7779-v2.txt
I have created HBASE-8019 for this issue. Thanks for reporting. On Wed, Mar 6, 2013 at 5:04 PM, Richard Ding <[email protected]> wrote: > While trying the snapshot code in HBase 0.94 branch (should be the same as > 0.94.6RC0), we encountered the problem that HBase region servers take long > time to shutdown (see the log below). This problem, however, doesn't exist > in 0.94.5. It looks like in RegionServerSnapshotManager.stop() method, the > ZK session is closed. This results in SessionExpiredException when > HRegionServer tries to delete MyEphemeralNode. > ... ... > 2013-03-06 11:53:19,767 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 256000ms before retry #8... > 2013-03-06 11:57:35,806 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com > ,60020,1362529262252 > 2013-03-06 11:57:35,806 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 512000ms before retry #9... > 2013-03-06 12:06:07,882 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com > ,60020,1362529262252 > 2013-03-06 12:06:07,882 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 1024000ms before retry #10... > 2013-03-06 12:23:12,034 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com > ,60020,1362529262252 > 2013-03-06 12:23:12,034 ERROR > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete > failed after 10 retries > 2013-03-06 12:23:12,034 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my > ephemeral node > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com > ,60020,1362529262252 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) > at > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:999) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:988) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1097) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:875) > at java.lang.Thread.run(Thread.java:738) > 2013-03-06 12:23:12,036 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > hdtest010.svl.ibm.com,60020,1362529262252; zookeeper connection closed. > 2013-03-06 12:23:12,036 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 > exiting > 2013-03-06 12:23:12,039 INFO > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting; > hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-12,5,main] > 2013-03-06 12:23:12,039 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook > 2013-03-06 12:23:12,039 INFO > org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown > hook thread. > 2013-03-06 12:23:12,042 INFO > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished. >
