Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
AssignmentManager
----------------------------------------------------------------------------------

                 Key: HBASE-4455
                 URL: https://issues.apache.org/jira/browse/HBASE-4455
             Project: HBase
          Issue Type: Bug
            Reporter: Ming Ma


Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
regions aren't in "regions in transtion" from AssignmentManager point of view, 
but they aren't assigned to any regions. Here are the issues.

1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
invoked to check if it contains -ROOT- region. That is due to long delay from 
ZK notification and async nature of the system. Here is an example, even though 
new root region server sea-lab-1,60020,1316380133656 is set at T2, at T3 the 
shutdown process for sea-lab-1,60020,1316380133656, the root location still 
points to old server sea-lab-3,60020,1316380037898.




T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
master:6
0000-0x1327e43175e0000 Retrieved 29 byte(s) of data from znode /hbase/root-regio
n-server and set watcher; sea-lab-3,60020,1316380037898

T2: 2011-09-18 14:08:57,173 INFO 
org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
location in ZooKeeper as sea-lab-1,60020,1316380133656


T3: 2011-09-18 14:10:26,393 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Adde
d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler to 
be executed, root=false, meta=true, current Root Location: 
sea-lab-3,60020,1316380037898

T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
master:6
0000-0x1327e43175e0000 Retrieved 29 byte(s) of data from znode 
/hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656


2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or .META. 
availability could be blocked. If meanwhile, the new server that -ROOT- or 
.META. is being assigned restarted, another instance of 
MetaServerShutdownHandler is queued. Eventually, all MetaServerShutdownHandler 
worker threads are filled up. It looks like HBASE-4245.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to