rajeshbabu created HBASE-9968:
---------------------------------

             Summary: Cluster is non operative if the RS carrying -ROOT- is 
expiring after deleting -ROOT- region transition znode and before adding it to 
online regions.
                 Key: HBASE-9968
                 URL: https://issues.apache.org/jira/browse/HBASE-9968
             Project: HBase
          Issue Type: Bug
          Components: Region Assignment
    Affects Versions: 0.94.11
            Reporter: rajeshbabu
            Assignee: rajeshbabu


When we check whether the dead region is carrying root or meta, first we will 
check any transition znode for the region is there or not. In this case it got 
deleted. So from zookeeper we cannot find the region location. 
{code}
    try {
      data = ZKAssign.getData(master.getZooKeeper(), hri.getEncodedName());
    } catch (KeeperException e) {
      master.abort("Unexpected ZK exception reading unassigned node for region="
        + hri.getEncodedName(), e);
    }
{code}
Now we will check from the AssignmentManager whether its in online regions or 
not
{code}
    ServerName addressFromAM = getRegionServerOfRegion(hri);
    boolean matchAM = (addressFromAM != null &&
      addressFromAM.equals(serverName));
    LOG.debug("based on AM, current region=" + hri.getRegionNameAsString() +
      " is on server=" + (addressFromAM != null ? addressFromAM : "null") +
      " server being checked: " + serverName);
{code}
>From AM we will get null because  while adding region to online regions we 
>will check whether the RS is in onlineservers or not and if not we will not 
>add the region to online regions.
{code}
      if (isServerOnline(sn)) {
        this.regions.put(regionInfo, sn);
        addToServers(sn, regionInfo);
        this.regions.notifyAll();
      } else {
        LOG.info("The server is not in online servers, ServerName=" + 
          sn.getServerName() + ", region=" + regionInfo.getEncodedName());
      }
{code}


Even though the dead regionserver carrying ROOT region, its returning false. 
After that ROOT region never assigned.

Here are the logs
{code}
2013-11-11 18:04:14,730 INFO 
org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region 
location in ZooKeeper
2013-11-11 18:04:14,775 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
-ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052, src=, 
dest=HOST-10-18-40-69,60020,1384173244404; 1 (online=1, available=1) available 
servers
2013-11-11 18:04:14,809 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region -ROOT-,,0.70236052 to HOST-10-18-40-69,60020,1384173244404
2013-11-11 18:04:18,375 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Looked up root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@12133926;
 serverName=HOST-10-18-40-69,60020,1384173244404
2013-11-11 18:04:26,213 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, 
server=HOST-10-18-40-69,60020,1384173244404, region=70236052/-ROOT-
2013-11-11 18:04:26,213 INFO 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for -ROOT-,,0.70236052 from HOST-10-18-40-69,60020,1384173244404; 
deleting unassigned node
2013-11-11 18:04:31,553 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
based on AM, current region=-ROOT-,,0.70236052 is on server=null server being 
checked: HOST-10-18-40-69,60020,1384173244404
2013-11-11 18:04:31,561 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Added=HOST-10-18-40-69,60020,1384173244404 to dead servers, submitted shutdown 
handler to be executed, root=false, meta=false
{code}
{code}
2013-11-11 18:04:32,323 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
The znode of region -ROOT-,,0.70236052 has been deleted.
2013-11-11 18:04:32,323 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
The server is not in online servers, 
ServerName=HOST-10-18-40-69,60020,1384173244404, region=70236052
2013-11-11 18:04:32,323 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
The master has opened the region -ROOT-,,0.70236052 that was online on 
HOST-10-18-40-69,60020,1384173244404
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to