[
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151650#comment-13151650
]
nkeywal commented on HBASE-4798:
--------------------------------
This said, when I look at the traces, it seems that it's something different
that's going on.
@stack; please just confirm that the cluster should be fully available even if
root is corrupted. I will then play with the debugger;
fyi, here are the logs:
{noformat}
2011-11-16 10:21:19,492 INFO [main]
client.HConnectionManager$HConnectionImplementation(1706): Closed zookeeper
sessionid=0x133ad9c76a60003
2011-11-16 10:21:19,494 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): master:58532-0x133ad9c76a60000 Received
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/master
2011-11-16 10:21:19,495 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60003 Received
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/master
2011-11-16 10:21:19,497 INFO [Master:0;localhost,58532,1321467672954]
master.HMaster(336): HMaster main thread exiting
2011-11-16 10:21:19,506 WARN [main-EventThread] zookeeper.ZKUtil(234):
hconnection-0x133ad9c76a60003 Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:225)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:182)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:262)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-11-16 10:21:19,506 ERROR [main-EventThread]
zookeeper.ZooKeeperWatcher(363): hconnection-0x133ad9c76a60003 Received
unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:225)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:182)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:262)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-11-16 10:21:19,523 INFO [main-EventThread]
client.HConnectionManager$HConnectionImplementation(1624): This client just
lost it's session with ZooKeeper, trying to reconnect.
2011-11-16 10:21:19,523 INFO [main-EventThread]
client.HConnectionManager$HConnectionImplementation(541): Trying to reconnect
to zookeeper
2011-11-16 10:21:19,524 DEBUG [main-EventThread] zookeeper.ZKUtil(95):
hconnection opening connection to ZooKeeper with ensemble (localhost:56525)
2011-11-16 10:21:19,526 INFO [main-EventThread]
zookeeper.RecoverableZooKeeper(89): The identifier of this process is
19557@ubuntu
2011-11-16 10:21:19,532 DEBUG [main] zookeeper.ZKUtil(95): Bad Root Location
Writer opening connection to ZooKeeper with ensemble (localhost:56525)
2011-11-16 10:21:19,533 INFO [main] zookeeper.RecoverableZooKeeper(89): The
identifier of this process is 19557@ubuntu
2011-11-16 10:21:19,534 INFO [main] catalog.RootLocationEditor(62): Setting
ROOT region location in ZooKeeper as example.org,1234,1321467679534
2011-11-16 10:21:19,536 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): hconnection Received ZooKeeper Event,
type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,537 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): Bad Root Location Writer Received ZooKeeper
Event, type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,537 DEBUG [main-EventThread] zookeeper.ZKUtil(230):
hconnection /hbase/master does not exist. Watcher is set.
2011-11-16 10:21:19,538 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(315): hconnection-0x133ad9c76a60004 connected
2011-11-16 10:21:19,539 ERROR [main] zookeeper.RecoverableZooKeeper(478): Node
/hbase/root-region-server already exists and this is not a retry
2011-11-16 10:21:19,539 DEBUG [main-EventThread] zookeeper.ZKUtil(228):
hconnection-0x133ad9c76a60004 Set watcher on existing znode
/hbase/root-region-server
2011-11-16 10:21:19,540 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(315): Bad Root Location Writer-0x133ad9c76a60005
connected
2011-11-16 10:21:19,540 DEBUG [main] catalog.RootLocationEditor(67): ROOT
region location already existed, updated location
2011-11-16 10:21:19,540 DEBUG [main-EventThread] zookeeper.ZKUtil(1034):
hconnection-0x133ad9c76a60004 Retrieved 29 byte(s) of data from znode
/hbase/root-region-server and set watcher; localhost,44843,1321467673319
2011-11-16 10:21:19,540 INFO [main-EventThread]
client.HConnectionManager$HConnectionImplementation(1627): Reconnected
successfully. This disconnect could have been caused by a network partition or
a long-running GC pause, either way it's recommended that you verify your
environment.
2011-11-16 10:21:19,542 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60004 Received
ZooKeeper Event, type=NodeDataChanged, state=SyncConnected,
path=/hbase/root-region-server
2011-11-16 10:21:19,544 DEBUG [main-EventThread] zookeeper.ZKUtil(1034):
hconnection-0x133ad9c76a60004 Retrieved 30 byte(s) of data from znode
/hbase/root-region-server and set watcher; example.org,1234,1321467679534
2011-11-16 10:21:19,555 DEBUG [main] util.FSUtils(288): Created version file at
hdfs://localhost:40123/user/nkeywal set its version at:7
2011-11-16 10:21:19,568 DEBUG [main] client.HConnectionManager(1805): Set
serverside HConnection retries=100
2011-11-16 10:21:19,573 INFO [main] ipc.HBaseRpcMetrics(59): Initializing RPC
Metrics with hostName=HMaster, port=50466
2011-11-16 10:21:19,582 DEBUG [main] zookeeper.ZKUtil(95): master:50466 opening
connection to ZooKeeper with ensemble (localhost:56525)
2011-11-16 10:21:19,584 INFO [main] zookeeper.RecoverableZooKeeper(89): The
identifier of this process is 19557@ubuntu
2011-11-16 10:21:19,588 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): master:50466 Received ZooKeeper Event,
type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,590 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(315): master:50466-0x133ad9c76a60006 connected
2011-11-16 10:21:19,592 INFO [main] metrics.MetricsMBeanBase(111): new
MBeanInfo
2011-11-16 10:21:19,592 INFO [main] metrics.MasterMetrics(89): Initialized
2011-11-16 10:21:19,613 DEBUG [main] client.HConnectionManager(1805): Set
serverside HConnection retries=100
2011-11-16 10:21:19,619 INFO [main] ipc.HBaseRpcMetrics(59): Initializing RPC
Metrics with hostName=MiniHBaseCluster$MiniHBaseClusterRegionServer, port=55575
2011-11-16 10:21:19,625 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60004 Received
ZooKeeper Event, type=NodeCreated, state=SyncConnected, path=/hbase/master
2011-11-16 10:21:19,625 INFO [Master:0;localhost,50466,1321467679574]
master.ActiveMasterManager(139): Master=localhost,50466,1321467679574
2011-11-16 10:21:19,626 DEBUG [Master:0;localhost,50466,1321467679574]
master.SplitLogManager(159): timeout = 25000
2011-11-16 10:21:19,626 DEBUG [Master:0;localhost,50466,1321467679574]
master.SplitLogManager(160): unassigned timeout = 180000
2011-11-16 10:21:19,626 DEBUG [main-EventThread] zookeeper.ZKUtil(1034):
hconnection-0x133ad9c76a60004 Retrieved 31 byte(s) of data from znode
/hbase/master and set watcher; \x00\x00localhost,50466,13214...
2011-11-16 10:21:19,627 INFO [Master:0;localhost,50466,1321467679574]
master.SplitLogManager(684): found 0 orphan tasks and 0 rescan nodes
2011-11-16 10:21:19,647 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(95): hconnection opening connection to ZooKeeper with ensemble
(localhost:56525)
2011-11-16 10:21:19,648 INFO [Master:0;localhost,50466,1321467679574]
zookeeper.RecoverableZooKeeper(89): The identifier of this process is
19557@ubuntu
2011-11-16 10:21:19,653 DEBUG
[Master:0;localhost,50466,1321467679574-EventThread]
zookeeper.ZooKeeperWatcher(238): hconnection Received ZooKeeper Event,
type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,653 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(228): hconnection Set watcher on existing znode /hbase/master
2011-11-16 10:21:19,654 DEBUG
[Master:0;localhost,50466,1321467679574-EventThread]
zookeeper.ZooKeeperWatcher(315): hconnection-0x133ad9c76a60007 connected
2011-11-16 10:21:19,654 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(1034): hconnection-0x133ad9c76a60007 Retrieved 31 byte(s) of
data from znode /hbase/master and set watcher; \x00\x00localhost,50466,13214...
2011-11-16 10:21:19,655 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(228): hconnection-0x133ad9c76a60007 Set watcher on existing
znode /hbase/root-region-server
2011-11-16 10:21:19,656 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(1034): hconnection-0x133ad9c76a60007 Retrieved 30 byte(s) of
data from znode /hbase/root-region-server and set watcher;
example.org,1234,1321467679534
2011-11-16 10:21:19,656 DEBUG [Master:0;localhost,50466,1321467679574]
catalog.CatalogTracker(217): Starting catalog tracker
org.apache.hadoop.hbase.catalog.CatalogTracker@2e18ff27
2011-11-16 10:21:19,657 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(228): master:50466-0x133ad9c76a60006 Set watcher on existing
znode /hbase/root-region-server
2011-11-16 10:21:19,658 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(1034): master:50466-0x133ad9c76a60006 Retrieved 30 byte(s) of
data from znode /hbase/root-region-server and set watcher;
example.org,1234,1321467679534
2011-11-16 10:21:19,658 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(230): master:50466-0x133ad9c76a60006
/hbase/unassigned/1028785192 does not exist. Watcher is set.
2011-11-16 10:21:19,660 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(230): master:50466-0x133ad9c76a60006 /hbase/shutdown does not
exist. Watcher is set.
2011-11-16 10:21:19,663 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): master:50466-0x133ad9c76a60006 Received
ZooKeeper Event, type=NodeCreated, state=SyncConnected, path=/hbase/shutdown
2011-11-16 10:21:19,664 INFO [Master:0;localhost,50466,1321467679574]
master.HMaster(387): Server active/primary master;
localhost,50466,1321467679574, sessionid=0x133ad9c76a60006, cluster-up flag
was=false
2011-11-16 10:21:19,664 DEBUG [Master:0;localhost,50466,1321467679574]
executor.ExecutorService(190): Starting executor service
name=MASTER_OPEN_REGION-localhost,50466,1321467679574, corePoolSize=5,
maxPoolSize=5
2011-11-16 10:21:19,664 DEBUG [Master:0;localhost,50466,1321467679574]
executor.ExecutorService(190): Starting executor service
name=MASTER_CLOSE_REGION-localhost,50466,1321467679574, corePoolSize=5,
maxPoolSize=5
2011-11-16 10:21:19,664 DEBUG [Master:0;localhost,50466,1321467679574]
executor.ExecutorService(190): Starting executor service
name=MASTER_SERVER_OPERATIONS-localhost,50466,1321467679574, corePoolSize=3,
maxPoolSize=3
2011-11-16 10:21:19,664 DEBUG [Master:0;localhost,50466,1321467679574]
executor.ExecutorService(190): Starting executor service
name=MASTER_META_SERVER_OPERATIONS-localhost,50466,1321467679574,
corePoolSize=5, maxPoolSize=5
2011-11-16 10:21:19,665 DEBUG [Master:0;localhost,50466,1321467679574]
executor.ExecutorService(190): Starting executor service
name=MASTER_TABLE_OPERATIONS-localhost,50466,1321467679574, corePoolSize=1,
maxPoolSize=1
2011-11-16 10:21:19,665 DEBUG [Master:0;localhost,50466,1321467679574]
master.LogCleaner(115): Add log cleaner in chain:
org.apache.hadoop.hbase.master.TimeToLiveLogCleaner
2011-11-16 10:21:19,665 DEBUG [main-EventThread] zookeeper.ZKUtil(1034):
master:50466-0x133ad9c76a60006 Retrieved 28 byte(s) of data from znode
/hbase/shutdown and set watcher; Wed Nov 16 10:21:19 PST 2011
2011-11-16 10:21:19,666 DEBUG [Master:0;localhost,50466,1321467679574]
master.HMaster(703): Started service threads
2011-11-16 10:21:19,723 INFO [main] regionserver.ShutdownHook(85): Installed
shutdown hook thread: Shutdownhook:RegionServer:0;localhost,55575,1321467679622
2011-11-16 10:21:19,726 DEBUG [RegionServer:0;localhost,55575,1321467679622]
zookeeper.ZKUtil(95): regionserver:55575 opening connection to ZooKeeper with
ensemble (localhost:56525)
2011-11-16 10:21:19,727 INFO [RegionServer:0;localhost,55575,1321467679622]
zookeeper.RecoverableZooKeeper(89): The identifier of this process is
19557@ubuntu
2011-11-16 10:21:19,733 DEBUG
[RegionServer:0;localhost,55575,1321467679622-EventThread]
zookeeper.ZooKeeperWatcher(238): regionserver:55575 Received ZooKeeper Event,
type=None, state=SyncConnected, path=null
2011-11-16 10:21:19,733 DEBUG [RegionServer:0;localhost,55575,1321467679622]
zookeeper.ZKUtil(228): regionserver:55575 Set watcher on existing znode
/hbase/master
2011-11-16 10:21:19,734 DEBUG [RegionServer:0;localhost,55575,1321467679622]
zookeeper.ZKUtil(1034): regionserver:55575 Retrieved 31 byte(s) of data from
znode /hbase/master and set watcher; \x00\x00localhost,50466,13214...
2011-11-16 10:21:19,734 DEBUG
[RegionServer:0;localhost,55575,1321467679622-EventThread]
zookeeper.ZooKeeperWatcher(315): regionserver:55575-0x133ad9c76a60008 connected
2011-11-16 10:21:19,735 DEBUG [RegionServer:0;localhost,55575,1321467679622]
zookeeper.ZKUtil(228): regionserver:55575-0x133ad9c76a60008 Set watcher on
existing znode /hbase/shutdown
2011-11-16 10:21:19,736 DEBUG [RegionServer:0;localhost,55575,1321467679622]
zookeeper.ZKUtil(1034): regionserver:55575-0x133ad9c76a60008 Retrieved 28
byte(s) of data from znode /hbase/shutdown and set watcher; Wed Nov 16 10:21:19
PST 2011
2011-11-16 10:21:19,736 DEBUG [RegionServer:0;localhost,55575,1321467679622]
catalog.CatalogTracker(217): Starting catalog tracker
org.apache.hadoop.hbase.catalog.CatalogTracker@35d56bbe
2011-11-16 10:21:19,737 DEBUG [RegionServer:0;localhost,55575,1321467679622]
zookeeper.ZKUtil(228): regionserver:55575-0x133ad9c76a60008 Set watcher on
existing znode /hbase/root-region-server
2011-11-16 10:21:19,738 DEBUG [RegionServer:0;localhost,55575,1321467679622]
zookeeper.ZKUtil(1034): regionserver:55575-0x133ad9c76a60008 Retrieved 30
byte(s) of data from znode /hbase/root-region-server and set watcher;
example.org,1234,1321467679534
2011-11-16 10:21:19,738 DEBUG [RegionServer:0;localhost,55575,1321467679622]
zookeeper.ZKUtil(230): regionserver:55575-0x133ad9c76a60008
/hbase/unassigned/1028785192 does not exist. Watcher is set.
2011-11-16 10:21:19,739 INFO [RegionServer:0;localhost,55575,1321467679622]
regionserver.MemStoreFlusher(114): globalMemStoreLimit=497.8m,
globalMemStoreLimitLowMark=435.6m, maxHeap=1.2g
2011-11-16 10:21:19,739 INFO [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer$CompactionChecker(1158): Runs every 16mins, 40sec
2011-11-16 10:21:19,767 INFO [Master:0;localhost,50466,1321467679574]
master.ServerManager(530): Waiting on regionserver(s) to checkin
2011-11-16 10:21:19,774 INFO [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer(1737): Attempting connect to Master server at
localhost,50466,1321467679574
2011-11-16 10:21:19,779 INFO [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer(1766): Connected to master at
localhost/127.0.0.1:50466
2011-11-16 10:21:19,779 INFO [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer(1792): Telling master at
localhost,50466,1321467679574 that we are up with port=55575,
startcode=1321467679622
2011-11-16 10:21:19,780 INFO [IPC Server handler 2 on 50466]
master.ServerManager(239): Registering server=localhost,55575,1321467679622
2011-11-16 10:21:19,781 INFO [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer(911): Master passed us hostname to use.
Was=localhost, Now=localhost
2011-11-16 10:21:19,781 DEBUG [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer(918): Config from master:
fs.default.name=hdfs://localhost:40123
2011-11-16 10:21:19,782 DEBUG [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer(918): Config from master:
hbase.rootdir=hdfs://localhost:40123/user/nkeywal
2011-11-16 10:21:19,783 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): master:50466-0x133ad9c76a60006 Received
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/rs
2011-11-16 10:21:19,786 DEBUG [main-EventThread] zookeeper.ZKUtil(228):
master:50466-0x133ad9c76a60006 Set watcher on existing znode
/hbase/rs/localhost,55575,1321467679622
2011-11-16 10:21:19,788 DEBUG [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer(1219):
logdir=hdfs://localhost:40123/user/nkeywal/.logs/localhost,55575,1321467679622
2011-11-16 10:21:19,805 INFO [RegionServer:0;localhost,55575,1321467679622]
wal.HLog(417): HLog configuration: blocksize=64 MB, rollsize=60.8 MB,
enabled=true, optionallogflushinternal=1000ms
2011-11-16 10:21:19,806 DEBUG [RegionServer:0;localhost,55575,1321467679622]
wal.SequenceFileLogWriter(113): new createWriter -- HADOOP-6840 -- not available
2011-11-16 10:21:19,810 DEBUG [RegionServer:0;localhost,55575,1321467679622]
wal.SequenceFileLogWriter(136):
Path=hdfs://localhost:40123/user/nkeywal/.logs/localhost,55575,1321467679622/localhost%2C55575%2C1321467679622.1321467679806,
syncFs=true, hflush=false
2011-11-16 10:21:19,810 INFO [RegionServer:0;localhost,55575,1321467679622]
wal.HLog(614): for
/user/nkeywal/.logs/localhost,55575,1321467679622/localhost%2C55575%2C1321467679622.1321467679806
2011-11-16 10:21:19,810 INFO [RegionServer:0;localhost,55575,1321467679622]
wal.HLog(459): Using getNumCurrentReplicas--HDFS-826
2011-11-16 10:21:19,811 INFO [RegionServer:0;localhost,55575,1321467679622]
metrics.MetricsMBeanBase(111): new MBeanInfo
2011-11-16 10:21:19,811 INFO [RegionServer:0;localhost,55575,1321467679622]
metrics.RegionServerMetrics(248): Initialized
2011-11-16 10:21:19,812 DEBUG [RegionServer:0;localhost,55575,1321467679622]
executor.ExecutorService(190): Starting executor service
name=RS_OPEN_REGION-localhost,55575,1321467679622, corePoolSize=3, maxPoolSize=3
2011-11-16 10:21:19,812 DEBUG [RegionServer:0;localhost,55575,1321467679622]
executor.ExecutorService(190): Starting executor service
name=RS_OPEN_ROOT-localhost,55575,1321467679622, corePoolSize=1, maxPoolSize=1
2011-11-16 10:21:19,812 DEBUG [RegionServer:0;localhost,55575,1321467679622]
executor.ExecutorService(190): Starting executor service
name=RS_OPEN_META-localhost,55575,1321467679622, corePoolSize=1, maxPoolSize=1
2011-11-16 10:21:19,813 DEBUG [RegionServer:0;localhost,55575,1321467679622]
executor.ExecutorService(190): Starting executor service
name=RS_CLOSE_REGION-localhost,55575,1321467679622, corePoolSize=3,
maxPoolSize=3
2011-11-16 10:21:19,813 DEBUG [RegionServer:0;localhost,55575,1321467679622]
executor.ExecutorService(190): Starting executor service
name=RS_CLOSE_ROOT-localhost,55575,1321467679622, corePoolSize=1, maxPoolSize=1
2011-11-16 10:21:19,813 DEBUG [RegionServer:0;localhost,55575,1321467679622]
executor.ExecutorService(190): Starting executor service
name=RS_CLOSE_META-localhost,55575,1321467679622, corePoolSize=1, maxPoolSize=1
2011-11-16 10:21:19,860 INFO [RegionServer:0;localhost,55575,1321467679622]
regionserver.HRegionServer(947): Serving as localhost,55575,1321467679622, RPC
listening on localhost/127.0.0.1:55575, sessionid=0x133ad9c76a60008
2011-11-16 10:21:19,860 INFO [SplitLogWorker-localhost,55575,1321467679622]
regionserver.SplitLogWorker(140): SplitLogWorker localhost,55575,1321467679622
starting
2011-11-16 10:21:19,868 INFO [Master:0;localhost,50466,1321467679574]
master.ServerManager(530): Waiting on regionserver(s) count to settle;
currently=1
2011-11-16 10:21:19,973 INFO [Master:0;localhost,50466,1321467679574]
master.MasterFileSystem(212): Log folder
hdfs://localhost:40123/user/nkeywal/.logs/localhost,55575,1321467679622 belongs
to an existing region server
2011-11-16 10:21:19,973 INFO [Master:0;localhost,50466,1321467679574]
master.MasterFileSystem(266): No logs to split
2011-11-16 10:21:19,975 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(518): master:50466-0x133ad9c76a60006 Unable to get data of
znode /hbase/unassigned/70236052 because node does not exist (not an error)
2011-11-16 10:21:19,978 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(1034): master:50466-0x133ad9c76a60006 Retrieved 30 byte(s) of
data from znode /hbase/root-region-server and set watcher;
example.org,1234,1321467679534
2011-11-16 10:21:19,981 DEBUG [Master:0;localhost,50466,1321467679574]
zookeeper.ZKUtil(1034): hconnection-0x133ad9c76a60007 Retrieved 36 byte(s) of
data from znode /hbase/hbaseid; data=03980100-b09a-48c1-9bf2-70f49...
2011-11-16 10:21:24,550 INFO [Thread-133] catalog.RootLocationEditor(43):
Unsetting ROOT region location in ZooKeeper
2011-11-16 10:21:24,575 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): master:50466-0x133ad9c76a60006 Received
ZooKeeper Event, type=NodeDeleted, state=SyncConnected,
path=/hbase/root-region-server
2011-11-16 10:21:24,575 DEBUG
[Master:0;localhost,50466,1321467679574-EventThread]
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60007 Received
ZooKeeper Event, type=NodeDeleted, state=SyncConnected,
path=/hbase/root-region-server
2011-11-16 10:21:24,576 DEBUG [main-EventThread]
zookeeper.ZooKeeperWatcher(238): hconnection-0x133ad9c76a60004 Received
ZooKeeper Event, type=NodeDeleted, state=SyncConnected,
path=/hbase/root-region-server
2011-11-16 10:21:24,580 DEBUG
[RegionServer:0;localhost,55575,1321467679622-EventThread]
zookeeper.ZooKeeperWatcher(238): regionserver:55575-0x133ad9c76a60008 Received
ZooKeeper Event, type=NodeDeleted, state=SyncConnected,
path=/hbase/root-region-server
2011-11-16 10:21:24,581 INFO [Thread-133]
catalog.TestCatalogTrackerOnCluster$2(85): RootLocationEditor is now clean.
2011-11-16 10:21:24,581 DEBUG
[Master:0;localhost,50466,1321467679574-EventThread] zookeeper.ZKUtil(230):
hconnection-0x133ad9c76a60007 /hbase/root-region-server does not exist. Watcher
is set.
2011-11-16 10:21:24,584 DEBUG [main-EventThread] zookeeper.ZKUtil(230):
hconnection-0x133ad9c76a60004 /hbase/root-region-server does not exist. Watcher
is set.
2011-11-16 10:21:24,584 DEBUG [main-EventThread] zookeeper.ZKUtil(230):
master:50466-0x133ad9c76a60006 /hbase/root-region-server does not exist.
Watcher is set.
2011-11-16 10:21:24,585 DEBUG
[RegionServer:0;localhost,55575,1321467679622-EventThread]
zookeeper.ZKUtil(230): regionserver:55575-0x133ad9c76a60008
/hbase/root-region-server does not exist. Watcher is set.
2011-11-16 10:21:49,769 ERROR [main] hbase.MiniHBaseCluster(201): Error
starting cluster
java.lang.RuntimeException: Master not initialized after 30 seconds
at
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
at
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:418)
{noformat}
> Sleeps and synchronisation improvements for tests
> -------------------------------------------------
>
> Key: HBASE-4798
> URL: https://issues.apache.org/jira/browse/HBASE-4798
> Project: HBase
> Issue Type: Improvement
> Components: master, regionserver, test
> Affects Versions: 0.94.0
> Environment: all
> Reporter: nkeywal
> Assignee: nkeywal
> Priority: Minor
> Attachments: 4798_trunk_all.v2.patch
>
>
> Multiple small changes:
> @commiters: Removing some sleeps made visible a bug on
> JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point.
> You may want to review this.
> JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never
> met (test on "!c && !!c"). Added a new synchronization point.
> AssignementManager#waitForAssignment: add a timeout on the wait => not stuck
> if the notification is received before the wait.
> HMaster#loop: use a notification instead of a 1s sleep
> HRegionServer#waitForServerOnline: new method used by
> JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
> HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s
> sleep
> HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
> ZooKeeperNodeTracker#start: replace a recursive call by a loop
> ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait => not
> stuck if the notification is received before the wait.
> HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
> TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s,
> with the change on HBaseTestingUtility we are 60s faster
> TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead
> of 1s
> TestRestartCluster#testClusterRestart: send all the table creation together,
> then check creation, should be faster
> TestHLog: shutdown the whole cluster instead of DFS only (more standard)
> JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
> HConnectionManager#close: Zookeeper name in debug message from
> HConnectionManager after connection close was always null because it was set
> to null in the delete.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira