[ 
https://issues.apache.org/jira/browse/HBASE-29364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiwen Deng updated HBASE-29364:
--------------------------------
    Description: 
We have encountered multiple instances where regions were opened on 
RegionServers (RS) that had already been offlined. It wasn't until recently 
that we discovered a potential cause for this issue, and the details of the 
problem are as follows:

Our HDFS storage reached the online level, which caused the upper-level master 
and rs to be unable to write and abort. Finally, we manually accessed and 
deleted some data, and HDFS was restored. Then the hbck report showed that some 
regions were opened on the offline rs, which caused these regions to be unable 
to server. We finally used hbck2 to assigns these regions, and the problem was 
solved.

 Here is the analysis of the region transition for one specific region: 
19f709990ad65ce3d51ddeaf29acf436:
 * 
2025-05-21, 05:48:11 : The region was assigned to 
rs-hostname,20700,1747777624803, but due to some anomalies, it could not be 
opened on the target RS. At this point, the RS reported the open result to the 
Master:

{code:java}
2025-05-21,05:48:11,646 INFO 
[RpcServer.priority.RWQ.Fifo.write.handler=2,queue=0,port=20600] 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase: Received 
report from rs-hostname,20700,1747777624803, transitionCode=FAILED_OPEN, 
seqId=-1, regionNode=state=OPENING, location=rs-hostname,20700,1747777624803, 
table=test:xxx, region=19f709990ad65ce3d51ddeaf29acf436, proc=pid=78499, 
ppid=78034, state=RUNNABLE; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure {code}
 
 

  was:
We have encountered multiple instances where regions were opened on 
RegionServers (RS) that had already been offlined. It wasn't until recently 
that we discovered a potential cause for this issue, and the details of the 
problem are as follows:
Our HDFS storage reached the online level, which caused the upper-level master 
and rs to be unable to write and abort. Finally, we manually accessed and 
deleted some data, and HDFS was restored. Then the hbck report showed that some 
regions were opened on the offline rs, which caused these regions to be unable 
to server. We finally used hbck2 to assigns these regions, and the problem was 
solved.
 


> Region will be opened in unknown regionserver when master is changed & rs 
> crashed
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-29364
>                 URL: https://issues.apache.org/jira/browse/HBASE-29364
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.3.0
>            Reporter: Zhiwen Deng
>            Priority: Major
>
> We have encountered multiple instances where regions were opened on 
> RegionServers (RS) that had already been offlined. It wasn't until recently 
> that we discovered a potential cause for this issue, and the details of the 
> problem are as follows:
> Our HDFS storage reached the online level, which caused the upper-level 
> master and rs to be unable to write and abort. Finally, we manually accessed 
> and deleted some data, and HDFS was restored. Then the hbck report showed 
> that some regions were opened on the offline rs, which caused these regions 
> to be unable to server. We finally used hbck2 to assigns these regions, and 
> the problem was solved.
>  Here is the analysis of the region transition for one specific region: 
> 19f709990ad65ce3d51ddeaf29acf436:
>  * 
> 2025-05-21, 05:48:11 : The region was assigned to 
> rs-hostname,20700,1747777624803, but due to some anomalies, it could not be 
> opened on the target RS. At this point, the RS reported the open result to 
> the Master:
> {code:java}
> 2025-05-21,05:48:11,646 INFO 
> [RpcServer.priority.RWQ.Fifo.write.handler=2,queue=0,port=20600] 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase: Received 
> report from rs-hostname,20700,1747777624803, transitionCode=FAILED_OPEN, 
> seqId=-1, regionNode=state=OPENING, location=rs-hostname,20700,1747777624803, 
> table=test:xxx, region=19f709990ad65ce3d51ddeaf29acf436, proc=pid=78499, 
> ppid=78034, state=RUNNABLE; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to