[ 
https://issues.apache.org/jira/browse/HBASE-24089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y. SREENIVASULU REDDY updated HBASE-24089:
------------------------------------------
    Description: 
During Rolling upgrade, we performed some set of operations, which leads to 
regions were stuck in RIT.
pre-requisites:
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.assignment.usezk</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.assignment.usezk.migrating</name>
    <value>true</value>
  </property>
{noformat}
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.mirror.table.state.to.zookeeper</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.migrate.table.state.from.zookeeper</name>
    <value>true</value>
  </property>
{noformat}

Steps to reproduce the problem.
{noformat}
1. start the hbase cluster with version 1.3.1 (1 master and 1 regionserver)
2. start the regionserver 2.2.x version [ 1 regionserver]
3. create the table with one region (ensure the table region with old version 
RS)
4. write some data into the table
5. flush the table.
6. create snapshot for the table
7. move the table region from old version to new version RS
8. disable the table.
9. restore snapshot on the table.
10 enable table.
{noformat}
After triggered the enable table operation, HBase 1.3.1 master assigned the 
region to HBase 1.3.1 Regionserver.
RS failed to open the region.
{noformat}
2020-03-18 21:10:45,103 WARN  [RS_OPEN_REGION-vm1:16040-17] zookeeper.ZKAssign: 
regionserver:16040-0x200431c58cf0012, quorum=vm1:2181,vm2:2181,vm3:2181, 
baseZNode=/hbase Attempt to transition the unassigned node for 
505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to 
RS_ZK_REGION_OPENING failed, the server that tried to transition was 
vm1,16040,1584536385246 not the expected vm2,16040,1584536781189
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] 
coordination.ZkOpenRegionCoordination: Failed transition from OFFLINE to 
OPENING for region=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] 
handler.OpenRegionHandler: Region was hijacked? Opening cancelled for 
encodedName=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 INFO  [RS_OPEN_REGION-vm1:16040-17] 
coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 
505f0e1d96a2a06eb111bd8b923a5a87, NAME => 
'usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.', 
STARTKEY => 'user1089', ENDKEY => 'user1134'} failed, transitioning from 
OFFLINE to FAILED_OPEN in ZK, expecting version 0
2020-03-18 21:10:45,104 DEBUG [RS_OPEN_REGION-vm1:16040-17] zookeeper.ZKAssign: 
regionserver:16040-0x200431c58cf0012, quorum=vm1:2181,vm2:2181,vm3:2181, 
baseZNode=/hbase Transitioning 505f0e1d96a2a06eb111bd8b923a5a87 from 
M_ZK_REGION_OFFLINE to RS_ZK_REGION_FAILED_OPEN
{noformat}

Looked little deeper into the problem, found that HMaster failed to delete the 
Znode, during the table disable operation.
{noformat}
2020-03-18 21:10:02,219 DEBUG 
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
master.AssignmentManager: Table being disabled so deleting ZK node and removing 
from regions in transition, skipping assignment of region 
usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.
2020-03-18 21:10:02,220 WARN  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: 
master:16000-0x100431c3faf0012, quorum=vm1:2181,vm2:2181,vm3:2181, 
baseZNode=/hbase Attempting to delete unassigned node 
505f0e1d96a2a06eb111bd8b923a5a87 in RS_ZK_REGION_CLOSED state but node is in 
M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 WARN  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: 
master:16000-0x100431c3faf0012, quorum=vm1:2181,vm2:2181,vm3:2181, 
baseZNode=/hbase Attempting to delete unassigned node 
505f0e1d96a2a06eb111bd8b923a5a87 in M_ZK_REGION_OFFLINE state but node is in 
M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 INFO  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
master.AssignmentManager: Failed to delete the closed node for 
505f0e1d96a2a06eb111bd8b923a5a87. The node type may not match
{noformat}

Region was closed successfully, and Latest RS sent RPC call back to the master 
about the region transition information, But master is expecting the Znode 
states modified by the RS, Based  on those states HM will delete the ZNode.

  was:
During Rolling upgrade, we performed some set of operations, which leads to 
regions were stuck in RIT.
pre-requisites:
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.assignment.usezk</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.assignment.usezk.migrating</name>
    <value>true</value>
  </property>
{noformat}
configure the below properties in HBase 1.3.1 version
{noformat}
 <property>
    <name>hbase.mirror.table.state.to.zookeeper</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.migrate.table.state.from.zookeeper</name>
    <value>true</value>
  </property>
{noformat}

Steps to reproduce the problem.
1. start the hbase cluster with version 1.3.1 (1 master and 1 regionserver)
2. start the regionserver 2.2.x version [ 1 regionserver]
3. create the table with one region (ensure the table region with old version 
RS)
4. write some data into the table
5. flush the table.
6. create snapshot for the table
7. move the table region from old version to new version RS
8. disable the table.
9. restore snapshot on the table.
10 enable table.

After triggered the enable table operation, HBase 1.3.1 master assigned the 
region to HBase 1.3.1 Regionserver.
RS failed to open the region.
{noformat}
2020-03-18 21:10:45,103 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] 
zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, 
quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase 
Attempt to transition the unassigned node for 505f0e1d96a2a06eb111bd8b923a5a87 
from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried 
to transition was blr1000030601,16040,1584536385246 not the expected 
blr1000030600,16040,1584536781189
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] 
coordination.ZkOpenRegionCoordination: Failed transition from OFFLINE to 
OPENING for region=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-BLR1000030601:16040-17] 
handler.OpenRegionHandler: Region was hijacked? Opening cancelled for 
encodedName=505f0e1d96a2a06eb111bd8b923a5a87
2020-03-18 21:10:45,104 INFO  [RS_OPEN_REGION-BLR1000030601:16040-17] 
coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 
505f0e1d96a2a06eb111bd8b923a5a87, NAME => 
'usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.', 
STARTKEY => 'user1089', ENDKEY => 'user1134'} failed, transitioning from 
OFFLINE to FAILED_OPEN in ZK, expecting version 0
2020-03-18 21:10:45,104 DEBUG [RS_OPEN_REGION-BLR1000030601:16040-17] 
zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, 
quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase 
Transitioning 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to 
RS_ZK_REGION_FAILED_OPEN
{noformat}

Looked little deeper into the problem, found that HMaster failed to delete the 
Znode, during the table disable operation.
{noformat}
2020-03-18 21:10:02,219 DEBUG 
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
master.AssignmentManager: Table being disabled so deleting ZK node and removing 
from regions in transition, skipping assignment of region 
usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.
2020-03-18 21:10:02,220 WARN  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: 
master:16000-0x100431c3faf0012, 
quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase 
Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in 
RS_ZK_REGION_CLOSED state but node is in M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 WARN  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: 
master:16000-0x100431c3faf0012, 
quorum=10.18.21.118:2181,10.18.21.109:2181,10.18.21.236:2181, baseZNode=/hbase 
Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in 
M_ZK_REGION_OFFLINE state but node is in M_ZK_REGION_CLOSING state
2020-03-18 21:10:02,221 INFO  
[RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
master.AssignmentManager: Failed to delete the closed node for 
505f0e1d96a2a06eb111bd8b923a5a87. The node type may not match
{noformat}

Region was closed successfully, and Latest RS sent RPC call back to the master 
about the region transition information, But master is expecting the Znode 
states modified by the RS, Based  on those states HM will delete the ZNode.


> Rolling Upgrade: Regions are in RIT during enabling the table after 
> restore_snapshot
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-24089
>                 URL: https://issues.apache.org/jira/browse/HBASE-24089
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>    Affects Versions: 1.3.4, 1.3.6
>            Reporter: Y. SREENIVASULU REDDY
>            Priority: Major
>
> During Rolling upgrade, we performed some set of operations, which leads to 
> regions were stuck in RIT.
> pre-requisites:
> configure the below properties in HBase 1.3.1 version
> {noformat}
>  <property>
>     <name>hbase.assignment.usezk</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>hbase.assignment.usezk.migrating</name>
>     <value>true</value>
>   </property>
> {noformat}
> configure the below properties in HBase 1.3.1 version
> {noformat}
>  <property>
>     <name>hbase.mirror.table.state.to.zookeeper</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>hbase.migrate.table.state.from.zookeeper</name>
>     <value>true</value>
>   </property>
> {noformat}
> Steps to reproduce the problem.
> {noformat}
> 1. start the hbase cluster with version 1.3.1 (1 master and 1 regionserver)
> 2. start the regionserver 2.2.x version [ 1 regionserver]
> 3. create the table with one region (ensure the table region with old version 
> RS)
> 4. write some data into the table
> 5. flush the table.
> 6. create snapshot for the table
> 7. move the table region from old version to new version RS
> 8. disable the table.
> 9. restore snapshot on the table.
> 10 enable table.
> {noformat}
> After triggered the enable table operation, HBase 1.3.1 master assigned the 
> region to HBase 1.3.1 Regionserver.
> RS failed to open the region.
> {noformat}
> 2020-03-18 21:10:45,103 WARN  [RS_OPEN_REGION-vm1:16040-17] 
> zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, 
> quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempt to transition the 
> unassigned node for 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE 
> to RS_ZK_REGION_OPENING failed, the server that tried to transition was 
> vm1,16040,1584536385246 not the expected vm2,16040,1584536781189
> 2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] 
> coordination.ZkOpenRegionCoordination: Failed transition from OFFLINE to 
> OPENING for region=505f0e1d96a2a06eb111bd8b923a5a87
> 2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] 
> handler.OpenRegionHandler: Region was hijacked? Opening cancelled for 
> encodedName=505f0e1d96a2a06eb111bd8b923a5a87
> 2020-03-18 21:10:45,104 INFO  [RS_OPEN_REGION-vm1:16040-17] 
> coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 
> 505f0e1d96a2a06eb111bd8b923a5a87, NAME => 
> 'usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.', 
> STARTKEY => 'user1089', ENDKEY => 'user1134'} failed, transitioning from 
> OFFLINE to FAILED_OPEN in ZK, expecting version 0
> 2020-03-18 21:10:45,104 DEBUG [RS_OPEN_REGION-vm1:16040-17] 
> zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, 
> quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Transitioning 
> 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to 
> RS_ZK_REGION_FAILED_OPEN
> {noformat}
> Looked little deeper into the problem, found that HMaster failed to delete 
> the Znode, during the table disable operation.
> {noformat}
> 2020-03-18 21:10:02,219 DEBUG 
> [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
> master.AssignmentManager: Table being disabled so deleting ZK node and 
> removing from regions in transition, skipping assignment of region 
> usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.
> 2020-03-18 21:10:02,220 WARN  
> [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
> zookeeper.ZKAssign: master:16000-0x100431c3faf0012, 
> quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempting to delete 
> unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in RS_ZK_REGION_CLOSED state 
> but node is in M_ZK_REGION_CLOSING state
> 2020-03-18 21:10:02,221 WARN  
> [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
> zookeeper.ZKAssign: master:16000-0x100431c3faf0012, 
> quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempting to delete 
> unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in M_ZK_REGION_OFFLINE state 
> but node is in M_ZK_REGION_CLOSING state
> 2020-03-18 21:10:02,221 INFO  
> [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] 
> master.AssignmentManager: Failed to delete the closed node for 
> 505f0e1d96a2a06eb111bd8b923a5a87. The node type may not match
> {noformat}
> Region was closed successfully, and Latest RS sent RPC call back to the 
> master about the region transition information, But master is expecting the 
> Znode states modified by the RS, Based  on those states HM will delete the 
> ZNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to