.META. getting stuck if RS hosting it is dead and znode state is in 
RS_ZK_REGION_OPENED
---------------------------------------------------------------------------------------

                 Key: HBASE-4400
                 URL: https://issues.apache.org/jira/browse/HBASE-4400
             Project: HBase
          Issue Type: Bug
            Reporter: ramkrishna.s.vasudevan
            Assignee: ramkrishna.s.vasudevan
             Fix For: 0.92.0, 0.90.5


Start 2 RS.
The .META. is being hosted by RS2 but while processing it goes down.

Now restart the master and RS1.  Master gets the RS name from the znode in 
RS_ZK_REGION_OPENED.  But as RS2 is not online still the master is not able to 
process the META at all.  Please find the logs
{noformat}
2011-09-14 16:43:51,949 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1315998828523, 
region=70236052/-ROOT-
2011-09-14 16:43:51,968 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- 
assigned=1, rit=false, location=linux76:60020
2011-09-14 16:43:51,970 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Processing region .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
2011-09-14 16:43:51,970 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Failed to find linux146,60020,1315998414623 in list of online servers; skipping 
registration of open of .META.,,1.1028785192
2011-09-14 16:43:51,971 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Waiting on 1028785192/.META.
2011-09-14 16:43:51,983 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=linux76,60020,1315998828523, 
region=70236052/-ROOT-
2011-09-14 16:43:51,986 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for 70236052; deleting unassigned node
2011-09-14 16:43:51,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x13267854032001d Deleting existing unassigned node for 70236052 
that is in expected state RS_ZK_REGION_OPENED
2011-09-14 16:43:51,998 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x13267854032001d Successfully deleted unassigned node for region 
70236052 in expected state RS_ZK_REGION_OPENED
2011-09-14 16:43:51,999 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
-ROOT-,,0.70236052 on linux76,60020,1315998828523
2011-09-14 16:44:00,945 INFO org.apache.hadoop.hbase.master.ServerManager: 
Registering server=linux146,60020,1315998839724, regionCount=0, userLoad=false
2011-09-14 16:46:20,003 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  .META.,,1.1028785192 state=OPEN, ts=0
2011-09-14 16:46:20,004 ERROR org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been OPEN for too long, we don't know where region was opened so 
can't do anything
{noformat}

{code}
        regionsInTransition.put(encodedRegionName, new RegionState(
            regionInfo, RegionState.State.OPEN, data.getStamp()));
          ................
        } else {
          HServerInfo hsi = this.serverManager.getServerInfo(sn);
          if (hsi == null) {
            LOG.info("Failed to find " + sn +
              " in list of online servers; skipping registration of open of " +
              regionInfo.getRegionNameAsString());
          } else {
            new OpenedRegionHandler(master, this, regionInfo, hsi).process();
          }
        }
{code}
So timeout monitor is not able to do anything here
{code}
          LOG.error("Region has been OPEN for too long, " +
          "we don't know where region was opened so can't do anything");
          synchronized(regionState) {
            regionState.update(regionState.getState());
          }
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to