[ 
https://issues.apache.org/jira/browse/HBASE-27773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Beitch updated HBASE-27773:
---------------------------------
    Attachment:     (was: Screenshot 2023-04-04 at 4.35.05 PM.png)

> STUCK Region-In-Transition state
> --------------------------------
>
>                 Key: HBASE-27773
>                 URL: https://issues.apache.org/jira/browse/HBASE-27773
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.4.11
>         Environment: HBase: 2.4.11
> Hadoop: 3.2.4
> ZooKeeper: 3.7.1
>            Reporter: Grigore Lupescu
>            Priority: Major
>         Attachments: config.txt
>
>
> One problem we encounter with some regularity is the `STUCK 
> Region-In-Transition state=OPENING`.
> We have a three server cluster that runs a full HBASE stack: 3 zookeeper 
> nodes, an HBASE master active and standby, 3 region servers, 3 HDFS data 
> nodes.
> We've managed to reproduce the stuck region in transition state, by rebooting 
> randomly one of the 3 nodes. This is not necessarily the only way it may end 
> up in this state, rather a deterministic way we managed to reproduce it to a 
> certain extent. Also (a) writing data to hbase while the node reboot happens 
> increases the chances of the stuck state being reached as well as (b) if the 
> rebooted node is also the active hbasemaster.
> Sample logs:
>  
> {code:java}
> [7745.457s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause) 
> 523M->44M(818M) 12.736ms
> [10505.454s][info][gc] GC(13) Pause Young (Normal) (G1 Evacuation Pause) 
> 523M->44M(818M) 11.066ms
> 2023-04-03 11:26:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:27:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:27:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:28:53,145 INFO  [master/cvp504:16000.Chore.1] master.HMaster: 
> Not running balancer (force=false, metaRIT=false) because 2 region(s) in 
> transition: [state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=78be037bae2fc201707fa511e90dfbbf, state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=b732898573f935b72fb1876c6ff944b3]
> 2023-04-03 11:28:53,168 WARN  [master/cvp504:16000.Chore.1] 
> janitor.CatalogJanitor: 
> unknown_server=cvp503.sjc.aristanetworks.com,16201,1680499899167/aeris_v2,\x09,1680499940070.78be037bae2fc201707fa511e90dfbbf.,
>  
> unknown_server=cvp503.sjc.aristanetworks.com,16201,1680499899167/aeris_v2,\x12,1680499940070.b732898573f935b72fb1876c6ff944b3.
> 2023-04-03 11:28:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:28:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:29:53,209 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:29:53,209 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition state=OPENING, 
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, 
> region=b732898573f935b72fb1876c6ff944b3{code}
>  
> The stuck state also gets _fixed_ if we kill the pod with the regionserver 
> which has the region with stuck in transition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to