[ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151650#comment-13151650
 ] 

nkeywal commented on HBASE-4798:
--------------------------------

This said, when I look at the traces, it seems that it's something different 
that's going on.
@stack; please just confirm that the cluster should be fully available even if 
root is corrupted. I will then play with the debugger;

fyi, here are the logs:
{noformat}
2011-11-16 10:21:19,492 INFO  [main] 
client.HConnectionManager$HConnectionImplementation(1706): Closed zookeeper 
sessionid=0x133ad9c76a60003
2011-11-16 10:21:19,494 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): master:58532-0x133ad9c76a60000 Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/master
2011-11-16 10:21:19,495 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60003 Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/master
2011-11-16 10:21:19,497 INFO  [Master:0;localhost,58532,1321467672954] 
master.HMaster(336): HMaster main thread exiting
2011-11-16 10:21:19,506 WARN  [main-EventThread] zookeeper.ZKUtil(234): 
hconnection-0x133ad9c76a60003 Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase/master
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
        at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:225)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:182)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:262)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-11-16 10:21:19,506 ERROR [main-EventThread] 
zookeeper.ZooKeeperWatcher(363): hconnection-0x133ad9c76a60003 Received 
unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase/master
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
        at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:225)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:182)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:262)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-11-16 10:21:19,523 INFO  [main-EventThread] 
client.HConnectionManager$HConnectionImplementation(1624): This client just 
lost it's session with ZooKeeper, trying to reconnect.
2011-11-16 10:21:19,523 INFO  [main-EventThread] 
client.HConnectionManager$HConnectionImplementation(541): Trying to reconnect 
to zookeeper
2011-11-16 10:21:19,524 DEBUG [main-EventThread] zookeeper.ZKUtil(95): 
hconnection opening connection to ZooKeeper with ensemble (localhost:56525)
2011-11-16 10:21:19,526 INFO  [main-EventThread] 
zookeeper.RecoverableZooKeeper(89): The identifier of this process is 
19557@ubuntu
2011-11-16 10:21:19,532 DEBUG [main] zookeeper.ZKUtil(95): Bad Root Location 
Writer opening connection to ZooKeeper with ensemble (localhost:56525)
2011-11-16 10:21:19,533 INFO  [main] zookeeper.RecoverableZooKeeper(89): The 
identifier of this process is 19557@ubuntu
2011-11-16 10:21:19,534 INFO  [main] catalog.RootLocationEditor(62): Setting 
ROOT region location in ZooKeeper as example.org,1234,1321467679534
2011-11-16 10:21:19,536 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): hconnection Received ZooKeeper Event, 
type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,537 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): Bad Root Location Writer Received ZooKeeper 
Event, type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,537 DEBUG [main-EventThread] zookeeper.ZKUtil(230): 
hconnection /hbase/master does not exist. Watcher is set.
2011-11-16 10:21:19,538 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(315): hconnection-0x133ad9c76a60004 connected
2011-11-16 10:21:19,539 ERROR [main] zookeeper.RecoverableZooKeeper(478): Node 
/hbase/root-region-server already exists and this is not a retry
2011-11-16 10:21:19,539 DEBUG [main-EventThread] zookeeper.ZKUtil(228): 
hconnection-0x133ad9c76a60004 Set watcher on existing znode 
/hbase/root-region-server
2011-11-16 10:21:19,540 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(315): Bad Root Location Writer-0x133ad9c76a60005 
connected
2011-11-16 10:21:19,540 DEBUG [main] catalog.RootLocationEditor(67): ROOT 
region location already existed, updated location
2011-11-16 10:21:19,540 DEBUG [main-EventThread] zookeeper.ZKUtil(1034): 
hconnection-0x133ad9c76a60004 Retrieved 29 byte(s) of data from znode 
/hbase/root-region-server and set watcher; localhost,44843,1321467673319
2011-11-16 10:21:19,540 INFO  [main-EventThread] 
client.HConnectionManager$HConnectionImplementation(1627): Reconnected 
successfully. This disconnect could have been caused by a network partition or 
a long-running GC pause, either way it's recommended that you verify your 
environment.
2011-11-16 10:21:19,542 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60004 Received 
ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, 
path=/hbase/root-region-server
2011-11-16 10:21:19,544 DEBUG [main-EventThread] zookeeper.ZKUtil(1034): 
hconnection-0x133ad9c76a60004 Retrieved 30 byte(s) of data from znode 
/hbase/root-region-server and set watcher; example.org,1234,1321467679534
2011-11-16 10:21:19,555 DEBUG [main] util.FSUtils(288): Created version file at 
hdfs://localhost:40123/user/nkeywal set its version at:7
2011-11-16 10:21:19,568 DEBUG [main] client.HConnectionManager(1805): Set 
serverside HConnection retries=100
2011-11-16 10:21:19,573 INFO  [main] ipc.HBaseRpcMetrics(59): Initializing RPC 
Metrics with hostName=HMaster, port=50466
2011-11-16 10:21:19,582 DEBUG [main] zookeeper.ZKUtil(95): master:50466 opening 
connection to ZooKeeper with ensemble (localhost:56525)
2011-11-16 10:21:19,584 INFO  [main] zookeeper.RecoverableZooKeeper(89): The 
identifier of this process is 19557@ubuntu
2011-11-16 10:21:19,588 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): master:50466 Received ZooKeeper Event, 
type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,590 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(315): master:50466-0x133ad9c76a60006 connected
2011-11-16 10:21:19,592 INFO  [main] metrics.MetricsMBeanBase(111): new 
MBeanInfo
2011-11-16 10:21:19,592 INFO  [main] metrics.MasterMetrics(89): Initialized
2011-11-16 10:21:19,613 DEBUG [main] client.HConnectionManager(1805): Set 
serverside HConnection retries=100
2011-11-16 10:21:19,619 INFO  [main] ipc.HBaseRpcMetrics(59): Initializing RPC 
Metrics with hostName=MiniHBaseCluster$MiniHBaseClusterRegionServer, port=55575
2011-11-16 10:21:19,625 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60004 Received 
ZooKeeper Event, type=NodeCreated, state=SyncConnected, path=/hbase/master
2011-11-16 10:21:19,625 INFO  [Master:0;localhost,50466,1321467679574] 
master.ActiveMasterManager(139): Master=localhost,50466,1321467679574
2011-11-16 10:21:19,626 DEBUG [Master:0;localhost,50466,1321467679574] 
master.SplitLogManager(159): timeout = 25000
2011-11-16 10:21:19,626 DEBUG [Master:0;localhost,50466,1321467679574] 
master.SplitLogManager(160): unassigned timeout = 180000
2011-11-16 10:21:19,626 DEBUG [main-EventThread] zookeeper.ZKUtil(1034): 
hconnection-0x133ad9c76a60004 Retrieved 31 byte(s) of data from znode 
/hbase/master and set watcher; \x00\x00localhost,50466,13214...
2011-11-16 10:21:19,627 INFO  [Master:0;localhost,50466,1321467679574] 
master.SplitLogManager(684): found 0 orphan tasks and 0 rescan nodes
2011-11-16 10:21:19,647 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(95): hconnection opening connection to ZooKeeper with ensemble 
(localhost:56525)
2011-11-16 10:21:19,648 INFO  [Master:0;localhost,50466,1321467679574] 
zookeeper.RecoverableZooKeeper(89): The identifier of this process is 
19557@ubuntu
2011-11-16 10:21:19,653 DEBUG 
[Master:0;localhost,50466,1321467679574-EventThread] 
zookeeper.ZooKeeperWatcher(238): hconnection Received ZooKeeper Event, 
type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,653 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(228): hconnection Set watcher on existing znode /hbase/master
2011-11-16 10:21:19,654 DEBUG 
[Master:0;localhost,50466,1321467679574-EventThread] 
zookeeper.ZooKeeperWatcher(315): hconnection-0x133ad9c76a60007 connected
2011-11-16 10:21:19,654 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(1034): hconnection-0x133ad9c76a60007 Retrieved 31 byte(s) of 
data from znode /hbase/master and set watcher; \x00\x00localhost,50466,13214...
2011-11-16 10:21:19,655 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(228): hconnection-0x133ad9c76a60007 Set watcher on existing 
znode /hbase/root-region-server
2011-11-16 10:21:19,656 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(1034): hconnection-0x133ad9c76a60007 Retrieved 30 byte(s) of 
data from znode /hbase/root-region-server and set watcher; 
example.org,1234,1321467679534
2011-11-16 10:21:19,656 DEBUG [Master:0;localhost,50466,1321467679574] 
catalog.CatalogTracker(217): Starting catalog tracker 
org.apache.hadoop.hbase.catalog.CatalogTracker@2e18ff27
2011-11-16 10:21:19,657 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(228): master:50466-0x133ad9c76a60006 Set watcher on existing 
znode /hbase/root-region-server
2011-11-16 10:21:19,658 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(1034): master:50466-0x133ad9c76a60006 Retrieved 30 byte(s) of 
data from znode /hbase/root-region-server and set watcher; 
example.org,1234,1321467679534
2011-11-16 10:21:19,658 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(230): master:50466-0x133ad9c76a60006 
/hbase/unassigned/1028785192 does not exist. Watcher is set.
2011-11-16 10:21:19,660 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(230): master:50466-0x133ad9c76a60006 /hbase/shutdown does not 
exist. Watcher is set.
2011-11-16 10:21:19,663 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): master:50466-0x133ad9c76a60006 Received 
ZooKeeper Event, type=NodeCreated, state=SyncConnected, path=/hbase/shutdown
2011-11-16 10:21:19,664 INFO  [Master:0;localhost,50466,1321467679574] 
master.HMaster(387): Server active/primary master; 
localhost,50466,1321467679574, sessionid=0x133ad9c76a60006, cluster-up flag 
was=false
2011-11-16 10:21:19,664 DEBUG [Master:0;localhost,50466,1321467679574] 
executor.ExecutorService(190): Starting executor service 
name=MASTER_OPEN_REGION-localhost,50466,1321467679574, corePoolSize=5, 
maxPoolSize=5
2011-11-16 10:21:19,664 DEBUG [Master:0;localhost,50466,1321467679574] 
executor.ExecutorService(190): Starting executor service 
name=MASTER_CLOSE_REGION-localhost,50466,1321467679574, corePoolSize=5, 
maxPoolSize=5
2011-11-16 10:21:19,664 DEBUG [Master:0;localhost,50466,1321467679574] 
executor.ExecutorService(190): Starting executor service 
name=MASTER_SERVER_OPERATIONS-localhost,50466,1321467679574, corePoolSize=3, 
maxPoolSize=3
2011-11-16 10:21:19,664 DEBUG [Master:0;localhost,50466,1321467679574] 
executor.ExecutorService(190): Starting executor service 
name=MASTER_META_SERVER_OPERATIONS-localhost,50466,1321467679574, 
corePoolSize=5, maxPoolSize=5
2011-11-16 10:21:19,665 DEBUG [Master:0;localhost,50466,1321467679574] 
executor.ExecutorService(190): Starting executor service 
name=MASTER_TABLE_OPERATIONS-localhost,50466,1321467679574, corePoolSize=1, 
maxPoolSize=1
2011-11-16 10:21:19,665 DEBUG [Master:0;localhost,50466,1321467679574] 
master.LogCleaner(115): Add log cleaner in chain: 
org.apache.hadoop.hbase.master.TimeToLiveLogCleaner
2011-11-16 10:21:19,665 DEBUG [main-EventThread] zookeeper.ZKUtil(1034): 
master:50466-0x133ad9c76a60006 Retrieved 28 byte(s) of data from znode 
/hbase/shutdown and set watcher; Wed Nov 16 10:21:19 PST 2011
2011-11-16 10:21:19,666 DEBUG [Master:0;localhost,50466,1321467679574] 
master.HMaster(703): Started service threads
2011-11-16 10:21:19,723 INFO  [main] regionserver.ShutdownHook(85): Installed 
shutdown hook thread: Shutdownhook:RegionServer:0;localhost,55575,1321467679622
2011-11-16 10:21:19,726 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.ZKUtil(95): regionserver:55575 opening connection to ZooKeeper with 
ensemble (localhost:56525)
2011-11-16 10:21:19,727 INFO  [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.RecoverableZooKeeper(89): The identifier of this process is 
19557@ubuntu
2011-11-16 10:21:19,733 DEBUG 
[RegionServer:0;localhost,55575,1321467679622-EventThread] 
zookeeper.ZooKeeperWatcher(238): regionserver:55575 Received ZooKeeper Event, 
type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,733 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.ZKUtil(228): regionserver:55575 Set watcher on existing znode 
/hbase/master
2011-11-16 10:21:19,734 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.ZKUtil(1034): regionserver:55575 Retrieved 31 byte(s) of data from 
znode /hbase/master and set watcher; \x00\x00localhost,50466,13214...
2011-11-16 10:21:19,734 DEBUG 
[RegionServer:0;localhost,55575,1321467679622-EventThread] 
zookeeper.ZooKeeperWatcher(315): regionserver:55575-0x133ad9c76a60008 connected
2011-11-16 10:21:19,735 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.ZKUtil(228): regionserver:55575-0x133ad9c76a60008 Set watcher on 
existing znode /hbase/shutdown
2011-11-16 10:21:19,736 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.ZKUtil(1034): regionserver:55575-0x133ad9c76a60008 Retrieved 28 
byte(s) of data from znode /hbase/shutdown and set watcher; Wed Nov 16 10:21:19 
PST 2011
2011-11-16 10:21:19,736 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
catalog.CatalogTracker(217): Starting catalog tracker 
org.apache.hadoop.hbase.catalog.CatalogTracker@35d56bbe
2011-11-16 10:21:19,737 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.ZKUtil(228): regionserver:55575-0x133ad9c76a60008 Set watcher on 
existing znode /hbase/root-region-server
2011-11-16 10:21:19,738 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.ZKUtil(1034): regionserver:55575-0x133ad9c76a60008 Retrieved 30 
byte(s) of data from znode /hbase/root-region-server and set watcher; 
example.org,1234,1321467679534
2011-11-16 10:21:19,738 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
zookeeper.ZKUtil(230): regionserver:55575-0x133ad9c76a60008 
/hbase/unassigned/1028785192 does not exist. Watcher is set.
2011-11-16 10:21:19,739 INFO  [RegionServer:0;localhost,55575,1321467679622] 
regionserver.MemStoreFlusher(114): globalMemStoreLimit=497.8m, 
globalMemStoreLimitLowMark=435.6m, maxHeap=1.2g
2011-11-16 10:21:19,739 INFO  [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer$CompactionChecker(1158): Runs every 16mins, 40sec
2011-11-16 10:21:19,767 INFO  [Master:0;localhost,50466,1321467679574] 
master.ServerManager(530): Waiting on regionserver(s) to checkin
2011-11-16 10:21:19,774 INFO  [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer(1737): Attempting connect to Master server at 
localhost,50466,1321467679574
2011-11-16 10:21:19,779 INFO  [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer(1766): Connected to master at 
localhost/127.0.0.1:50466
2011-11-16 10:21:19,779 INFO  [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer(1792): Telling master at 
localhost,50466,1321467679574 that we are up with port=55575, 
startcode=1321467679622
2011-11-16 10:21:19,780 INFO  [IPC Server handler 2 on 50466] 
master.ServerManager(239): Registering server=localhost,55575,1321467679622
2011-11-16 10:21:19,781 INFO  [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer(911): Master passed us hostname to use. 
Was=localhost, Now=localhost
2011-11-16 10:21:19,781 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer(918): Config from master: 
fs.default.name=hdfs://localhost:40123
2011-11-16 10:21:19,782 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer(918): Config from master: 
hbase.rootdir=hdfs://localhost:40123/user/nkeywal
2011-11-16 10:21:19,783 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): master:50466-0x133ad9c76a60006 Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/rs
2011-11-16 10:21:19,786 DEBUG [main-EventThread] zookeeper.ZKUtil(228): 
master:50466-0x133ad9c76a60006 Set watcher on existing znode 
/hbase/rs/localhost,55575,1321467679622
2011-11-16 10:21:19,788 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer(1219): 
logdir=hdfs://localhost:40123/user/nkeywal/.logs/localhost,55575,1321467679622
2011-11-16 10:21:19,805 INFO  [RegionServer:0;localhost,55575,1321467679622] 
wal.HLog(417): HLog configuration: blocksize=64 MB, rollsize=60.8 MB, 
enabled=true, optionallogflushinternal=1000ms
2011-11-16 10:21:19,806 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
wal.SequenceFileLogWriter(113): new createWriter -- HADOOP-6840 -- not available
2011-11-16 10:21:19,810 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
wal.SequenceFileLogWriter(136): 
Path=hdfs://localhost:40123/user/nkeywal/.logs/localhost,55575,1321467679622/localhost%2C55575%2C1321467679622.1321467679806,
 syncFs=true, hflush=false
2011-11-16 10:21:19,810 INFO  [RegionServer:0;localhost,55575,1321467679622] 
wal.HLog(614):  for 
/user/nkeywal/.logs/localhost,55575,1321467679622/localhost%2C55575%2C1321467679622.1321467679806
2011-11-16 10:21:19,810 INFO  [RegionServer:0;localhost,55575,1321467679622] 
wal.HLog(459): Using getNumCurrentReplicas--HDFS-826
2011-11-16 10:21:19,811 INFO  [RegionServer:0;localhost,55575,1321467679622] 
metrics.MetricsMBeanBase(111): new MBeanInfo
2011-11-16 10:21:19,811 INFO  [RegionServer:0;localhost,55575,1321467679622] 
metrics.RegionServerMetrics(248): Initialized
2011-11-16 10:21:19,812 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
executor.ExecutorService(190): Starting executor service 
name=RS_OPEN_REGION-localhost,55575,1321467679622, corePoolSize=3, maxPoolSize=3
2011-11-16 10:21:19,812 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
executor.ExecutorService(190): Starting executor service 
name=RS_OPEN_ROOT-localhost,55575,1321467679622, corePoolSize=1, maxPoolSize=1
2011-11-16 10:21:19,812 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
executor.ExecutorService(190): Starting executor service 
name=RS_OPEN_META-localhost,55575,1321467679622, corePoolSize=1, maxPoolSize=1
2011-11-16 10:21:19,813 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
executor.ExecutorService(190): Starting executor service 
name=RS_CLOSE_REGION-localhost,55575,1321467679622, corePoolSize=3, 
maxPoolSize=3
2011-11-16 10:21:19,813 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
executor.ExecutorService(190): Starting executor service 
name=RS_CLOSE_ROOT-localhost,55575,1321467679622, corePoolSize=1, maxPoolSize=1
2011-11-16 10:21:19,813 DEBUG [RegionServer:0;localhost,55575,1321467679622] 
executor.ExecutorService(190): Starting executor service 
name=RS_CLOSE_META-localhost,55575,1321467679622, corePoolSize=1, maxPoolSize=1
2011-11-16 10:21:19,860 INFO  [RegionServer:0;localhost,55575,1321467679622] 
regionserver.HRegionServer(947): Serving as localhost,55575,1321467679622, RPC 
listening on localhost/127.0.0.1:55575, sessionid=0x133ad9c76a60008
2011-11-16 10:21:19,860 INFO  [SplitLogWorker-localhost,55575,1321467679622] 
regionserver.SplitLogWorker(140): SplitLogWorker localhost,55575,1321467679622 
starting
2011-11-16 10:21:19,868 INFO  [Master:0;localhost,50466,1321467679574] 
master.ServerManager(530): Waiting on regionserver(s) count to settle; 
currently=1
2011-11-16 10:21:19,973 INFO  [Master:0;localhost,50466,1321467679574] 
master.MasterFileSystem(212): Log folder 
hdfs://localhost:40123/user/nkeywal/.logs/localhost,55575,1321467679622 belongs 
to an existing region server
2011-11-16 10:21:19,973 INFO  [Master:0;localhost,50466,1321467679574] 
master.MasterFileSystem(266): No logs to split
2011-11-16 10:21:19,975 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(518): master:50466-0x133ad9c76a60006 Unable to get data of 
znode /hbase/unassigned/70236052 because node does not exist (not an error)
2011-11-16 10:21:19,978 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(1034): master:50466-0x133ad9c76a60006 Retrieved 30 byte(s) of 
data from znode /hbase/root-region-server and set watcher; 
example.org,1234,1321467679534
2011-11-16 10:21:19,981 DEBUG [Master:0;localhost,50466,1321467679574] 
zookeeper.ZKUtil(1034): hconnection-0x133ad9c76a60007 Retrieved 36 byte(s) of 
data from znode /hbase/hbaseid; data=03980100-b09a-48c1-9bf2-70f49...
2011-11-16 10:21:24,550 INFO  [Thread-133] catalog.RootLocationEditor(43): 
Unsetting ROOT region location in ZooKeeper
2011-11-16 10:21:24,575 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): master:50466-0x133ad9c76a60006 Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server
2011-11-16 10:21:24,575 DEBUG 
[Master:0;localhost,50466,1321467679574-EventThread] 
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60007 Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server
2011-11-16 10:21:24,576 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60004 Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server
2011-11-16 10:21:24,580 DEBUG 
[RegionServer:0;localhost,55575,1321467679622-EventThread] 
zookeeper.ZooKeeperWatcher(238): regionserver:55575-0x133ad9c76a60008 Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server
2011-11-16 10:21:24,581 INFO  [Thread-133] 
catalog.TestCatalogTrackerOnCluster$2(85): RootLocationEditor is now clean.
2011-11-16 10:21:24,581 DEBUG 
[Master:0;localhost,50466,1321467679574-EventThread] zookeeper.ZKUtil(230): 
hconnection-0x133ad9c76a60007 /hbase/root-region-server does not exist. Watcher 
is set.
2011-11-16 10:21:24,584 DEBUG [main-EventThread] zookeeper.ZKUtil(230): 
hconnection-0x133ad9c76a60004 /hbase/root-region-server does not exist. Watcher 
is set.
2011-11-16 10:21:24,584 DEBUG [main-EventThread] zookeeper.ZKUtil(230): 
master:50466-0x133ad9c76a60006 /hbase/root-region-server does not exist. 
Watcher is set.
2011-11-16 10:21:24,585 DEBUG 
[RegionServer:0;localhost,55575,1321467679622-EventThread] 
zookeeper.ZKUtil(230): regionserver:55575-0x133ad9c76a60008 
/hbase/root-region-server does not exist. Watcher is set.
2011-11-16 10:21:49,769 ERROR [main] hbase.MiniHBaseCluster(201): Error 
starting cluster
java.lang.RuntimeException: Master not initialized after 30 seconds
        at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
        at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:418)
{noformat}
                
> Sleeps and synchronisation improvements for tests
> -------------------------------------------------
>
>                 Key: HBASE-4798
>                 URL: https://issues.apache.org/jira/browse/HBASE-4798
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver, test
>    Affects Versions: 0.94.0
>         Environment: all
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 4798_trunk_all.v2.patch
>
>
> Multiple small changes:
> @commiters: Removing some sleeps made visible a bug on 
> JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
> You may want to review this.
> JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
> met (test on "!c && !!c"). Added a new synchronization point.
> AssignementManager#waitForAssignment: add a timeout on the wait => not stuck 
> if the notification is received before the wait.
> HMaster#loop: use a notification instead of a 1s sleep
> HRegionServer#waitForServerOnline: new method used by 
> JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
> HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
> sleep
> HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
> ZooKeeperNodeTracker#start: replace a recursive call by a loop
> ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait => not 
> stuck if the notification is received before the wait.
> HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
> TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
> with the change on HBaseTestingUtility we are 60s faster
> TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
> of 1s
> TestRestartCluster#testClusterRestart: send all the table creation together, 
> then check creation, should be faster
> TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
> JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
> HConnectionManager#close: Zookeeper name in debug message from 
> HConnectionManager after connection close was always null because it was set 
> to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to