[
https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642348#comment-13642348
]
stack commented on HBASE-8422:
------------------------------
[~lhofhansl] This fixes issue where I shut down a master that was waiting on
regionservers. It would not go down.
Master should stay up and wait for ever as it used to with this patch in place.
Regarding the worrisome bit of code, I think your worries will be alleviated if
you check where the code is located: i.e. it is run as part of our shutdown on
our way down.
> Master won't go down. Stuck waiting on .META. to come on line.
> ---------------------------------------------------------------
>
> Key: HBASE-8422
> URL: https://issues.apache.org/jira/browse/HBASE-8422
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.95.0
> Reporter: stack
> Assignee: rajeshbabu
> Fix For: 0.98.0, 0.94.8, 0.95.1
>
> Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch,
> HBASE-8422_94.patch, HBASE-8422.patch
>
>
> Master came up w/ no regionservers. I then tried to shut it down. You can
> see in below that it started to go down....
> {code}
> 2013-04-24 14:28:49,770 INFO [IPC Server handler 7 on 60000]
> org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested
> 2013-04-24 14:28:49,815 INFO
> [master-stack-1.ent.cloudera.com,60000,1366838923135]
> org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region
> servers count to settle; checked in 0, slept for 2818 ms, expecting minimum
> of 1, maximum of 2147483647, master is stopped.
> 2013-04-24 14:28:49,815 WARN
> [master-stack-1.ent.cloudera.com,60000,1366838923135]
> org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while
> splitting logs
> 2013-04-24 14:28:50,104 INFO
> [stack-1.ent.cloudera.com,60000,1366838923135.splitLogManagerTimeoutMonitor]
> org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor:
> stack-1.ent.cloudera.com,60000,1366838923135.splitLogManagerTimeoutMonitor
> exiting
> 2013-04-24 14:28:50,850 INFO
> [master-stack-1.ent.cloudera.com,60000,1366838923135]
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region
> location in ZooKeeper
> 2013-04-24 14:28:50,884 WARN
> [master-stack-1.ent.cloudera.com,60000,1366838923135]
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
> /hbase/meta-region-server already deleted, retry=false
> 2013-04-24 14:28:50,884 INFO
> [master-stack-1.ent.cloudera.com,60000,1366838923135]
> org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set;
> skipping assign of .META.,,1.1028785192
> 2013-04-24 14:28:50,884 INFO
> [master-stack-1.ent.cloudera.com,60000,1366838923135]
> org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't
> finished failover cleanup
> 2013-04-24 14:29:46,188 INFO
> [master-stack-1.ent.cloudera.com,60000,1366838923135.oldLogCleaner]
> org.apache.hadoop.hbase.master.cleaner.LogCleaner:
> master-stack-1.ent.cloudera.com,60000,1366838923135.oldLogCleaner exiting
> 2013-04-24 14:29:46,193 INFO
> [master-stack-1.ent.cloudera.com,60000,1366838923135.archivedHFileCleaner]
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner:
> master-stack-1.ent.cloudera.com,60000,1366838923135.archivedHFileCleaner
> exiting
> {code}
> ... but not it is stuck.
> We keep looping here:
> {code}
> "master-stack-1.ent.cloudera.com,60000,1366838923135" prio=10
> tid=0x00007f154853f000 nid=0x18b in Object.wait() [0x00007f1545fde000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000c727d738> (a
> org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161)
> - locked <0x00000000c727d738> (a
> org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
> at
> org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105)
> at
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250)
> at
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299)
> at
> org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905)
> at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879)
> at
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522)
> at java.lang.Thread.run(Thread.java:722)
> {code}
> Odd. It is supposed to be checking the 'stopped' flag; maybe it has wrong
> stop flag.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira