rajeshbabu created HBASE-9968:
---------------------------------
Summary: Cluster is non operative if the RS carrying -ROOT- is
expiring after deleting -ROOT- region transition znode and before adding it to
online regions.
Key: HBASE-9968
URL: https://issues.apache.org/jira/browse/HBASE-9968
Project: HBase
Issue Type: Bug
Components: Region Assignment
Affects Versions: 0.94.11
Reporter: rajeshbabu
Assignee: rajeshbabu
When we check whether the dead region is carrying root or meta, first we will
check any transition znode for the region is there or not. In this case it got
deleted. So from zookeeper we cannot find the region location.
{code}
try {
data = ZKAssign.getData(master.getZooKeeper(), hri.getEncodedName());
} catch (KeeperException e) {
master.abort("Unexpected ZK exception reading unassigned node for region="
+ hri.getEncodedName(), e);
}
{code}
Now we will check from the AssignmentManager whether its in online regions or
not
{code}
ServerName addressFromAM = getRegionServerOfRegion(hri);
boolean matchAM = (addressFromAM != null &&
addressFromAM.equals(serverName));
LOG.debug("based on AM, current region=" + hri.getRegionNameAsString() +
" is on server=" + (addressFromAM != null ? addressFromAM : "null") +
" server being checked: " + serverName);
{code}
>From AM we will get null because while adding region to online regions we
>will check whether the RS is in onlineservers or not and if not we will not
>add the region to online regions.
{code}
if (isServerOnline(sn)) {
this.regions.put(regionInfo, sn);
addToServers(sn, regionInfo);
this.regions.notifyAll();
} else {
LOG.info("The server is not in online servers, ServerName=" +
sn.getServerName() + ", region=" + regionInfo.getEncodedName());
}
{code}
Even though the dead regionserver carrying ROOT region, its returning false.
After that ROOT region never assigned.
Here are the logs
{code}
2013-11-11 18:04:14,730 INFO
org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region
location in ZooKeeper
2013-11-11 18:04:14,775 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
No previous transition plan was found (or we are ignoring an existing plan) for
-ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052, src=,
dest=HOST-10-18-40-69,60020,1384173244404; 1 (online=1, available=1) available
servers
2013-11-11 18:04:14,809 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region -ROOT-,,0.70236052 to HOST-10-18-40-69,60020,1384173244404
2013-11-11 18:04:18,375 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Looked up root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@12133926;
serverName=HOST-10-18-40-69,60020,1384173244404
2013-11-11 18:04:26,213 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENED,
server=HOST-10-18-40-69,60020,1384173244404, region=70236052/-ROOT-
2013-11-11 18:04:26,213 INFO
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED
event for -ROOT-,,0.70236052 from HOST-10-18-40-69,60020,1384173244404;
deleting unassigned node
2013-11-11 18:04:31,553 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
based on AM, current region=-ROOT-,,0.70236052 is on server=null server being
checked: HOST-10-18-40-69,60020,1384173244404
2013-11-11 18:04:31,561 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Added=HOST-10-18-40-69,60020,1384173244404 to dead servers, submitted shutdown
handler to be executed, root=false, meta=false
{code}
{code}
2013-11-11 18:04:32,323 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
The znode of region -ROOT-,,0.70236052 has been deleted.
2013-11-11 18:04:32,323 INFO org.apache.hadoop.hbase.master.AssignmentManager:
The server is not in online servers,
ServerName=HOST-10-18-40-69,60020,1384173244404, region=70236052
2013-11-11 18:04:32,323 INFO org.apache.hadoop.hbase.master.AssignmentManager:
The master has opened the region -ROOT-,,0.70236052 that was online on
HOST-10-18-40-69,60020,1384173244404
{code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)