Y. SREENIVASULU REDDY created HBASE-24089:
---------------------------------------------

             Summary: Rolling Upgrade: Regions are in RIT during enabling the 
table after restore_snapshot
                 Key: HBASE-24089
                 URL: https://issues.apache.org/jira/browse/HBASE-24089
             Project: HBase
          Issue Type: Bug
          Components: amv2
    Affects Versions: 1.3.6, 1.3.4
            Reporter: Y. SREENIVASULU REDDY


During Rolling upgrade, we performed some set of operations, which leads to 
regions were stuck in RIT.
pre-requisites:
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.assignment.usezk</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.assignment.usezk.migrating</name>
    <value>true</value>
  </property>
{noformat}
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.mirror.table.state.to.zookeeper</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.migrate.table.state.from.zookeeper</name>
    <value>true</value>
  </property>
{noformat}

Steps to reproduce the problem.
1. start the hbase cluster with version 1.3.1 (1 master and 1 regionserver)
2. start the regionserver 2.2.x version [ 1 regionserver]
3. create the table with one region (ensure the table region with old version 
RS)
4. write some data into the table
5. flush the table.
6. create snapshot for the table
7. move the table region from old version to new version RS
8. disable the table.
9. restore snapshot on the table.
10 enable table.

After triggered the enable table operation, HBase 1.3.1 master assigned the 
region to HBase 1.3.1 Regionserver.
RS failed to open the region.
{noformat}
2020-03-18 21:10:45,103 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] 
zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, 
quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase 
Attempt to transition the unassigned node for 505f0e1d96a2a06eb111bd8b923a5a87 
from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried 
to transition was blr1000030601,16040,1584536385246 not the expected 
blr1000030600,16040,1584536781189
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] 
coordination.ZkOpenRegionCoordination: Failed transition from OFFLINE to 
OPENING for region=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] 
handler.OpenRegionHandler: Region was hijacked? Opening cancelled for 
encodedName=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 INFO  [RS_OPEN_REGION-BLR1000030601:16040-17] 
coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 
505f0e1d96a2a06eb111bd8b923a5a87, NAME => 
'usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.', 
STARTKEY => 'user1089', ENDKEY => 'user1134'} failed, transitioning from 
OFFLINE to FAILED_OPEN in ZK, expecting version 0
2020-03-18 21:10:45,104 DEBUG [RS_OPEN_REGION-BLR1000030601:16040-17] 
zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, 
quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase 
Transitioning 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to 
RS_ZK_REGION_FAILED_OPEN
{noformat}

Looked little deeper into the problem, found that HMaster failed to delete the 
Znode, during the table disable operation.
{noformat}
2020-03-18 21:10:02,219 DEBUG 
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
master.AssignmentManager: Table being disabled so deleting ZK node and removing 
from regions in transition, skipping assignment of region 
usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.
2020-03-18 21:10:02,220 WARN  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: 
master:16000-0x100431c3faf0012, 
quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase 
Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in 
RS_ZK_REGION_CLOSED state but node is in M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 WARN  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: 
master:16000-0x100431c3faf0012, 
quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase 
Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in 
M_ZK_REGION_OFFLINE state but node is in M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 INFO  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
master.AssignmentManager: Failed to delete the closed node for 
505f0e1d96a2a06eb111bd8b923a5a87. The node type may not match
{noformat}

Region was closed successfully, and Latest RS sent RPC call back to the master 
about the region transition information, But master is expecting the Znode 
states modified by the RS, Based  on those states HM will delete the ZNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to