Thanks Ted for the quick solution.
On Wed, Mar 6, 2013 at 5:25 PM, Ted Yu <[email protected]> wrote: > Richard: > If you can try out the fix from HBASE-8019, that would be great. > > Meanwhile, I will run the fix through 0.94 test suite. > > Cheers > > On Wed, Mar 6, 2013 at 5:19 PM, Ted Yu <[email protected]> wrote: > > > Looks like the fix from HBASE-7779 wasn't included. > > See: > > https://issues.apache.org/jira/secure/attachment/12568663/7779-v2.txt > > > > I have created HBASE-8019 for this issue. > > > > Thanks for reporting. > > > > > > On Wed, Mar 6, 2013 at 5:04 PM, Richard Ding <[email protected]> wrote: > > > >> While trying the snapshot code in HBase 0.94 branch (should be the same > as > >> 0.94.6RC0), we encountered the problem that HBase region servers take > long > >> time to shutdown (see the log below). This problem, however, doesn't > exist > >> in 0.94.5. It looks like in RegionServerSnapshotManager.stop() method, > the > >> ZK session is closed. This results in SessionExpiredException when > >> HRegionServer tries to delete MyEphemeralNode. > >> ... ... > >> 2013-03-06 11:53:19,767 INFO org.apache.hadoop.hbase.util.RetryCounter: > >> Sleeping 256000ms before retry #8... > >> 2013-03-06 11:57:35,806 WARN > >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly > transient > >> ZooKeeper exception: > >> org.apache.zookeeper.KeeperException$SessionExpiredException: > >> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com > >> ,60020,1362529262252 > >> 2013-03-06 11:57:35,806 INFO org.apache.hadoop.hbase.util.RetryCounter: > >> Sleeping 512000ms before retry #9... > >> 2013-03-06 12:06:07,882 WARN > >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly > transient > >> ZooKeeper exception: > >> org.apache.zookeeper.KeeperException$SessionExpiredException: > >> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com > >> ,60020,1362529262252 > >> 2013-03-06 12:06:07,882 INFO org.apache.hadoop.hbase.util.RetryCounter: > >> Sleeping 1024000ms before retry #10... > >> 2013-03-06 12:23:12,034 WARN > >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly > transient > >> ZooKeeper exception: > >> org.apache.zookeeper.KeeperException$SessionExpiredException: > >> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com > >> ,60020,1362529262252 > >> 2013-03-06 12:23:12,034 ERROR > >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete > >> failed after 10 retries > >> 2013-03-06 12:23:12,034 WARN > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my > >> ephemeral node > >> org.apache.zookeeper.KeeperException$SessionExpiredException: > >> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com > >> ,60020,1362529262252 > >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > >> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) > >> at > >> > >> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133) > >> at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:999) > >> at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:988) > >> at > >> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1097) > >> at > >> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:875) > >> at java.lang.Thread.run(Thread.java:738) > >> 2013-03-06 12:23:12,036 INFO > >> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > >> hdtest010.svl.ibm.com,60020,1362529262252; zookeeper connection closed. > >> 2013-03-06 12:23:12,036 INFO > >> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 > >> exiting > >> 2013-03-06 12:23:12,039 INFO > >> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook > starting; > >> hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-12,5,main] > >> 2013-03-06 12:23:12,039 INFO > >> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown > hook > >> 2013-03-06 12:23:12,039 INFO > >> org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown > >> hook thread. > >> 2013-03-06 12:23:12,042 INFO > >> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook > finished. > >> > > > > >
