[
https://issues.apache.org/jira/browse/HBASE-27773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron Beitch updated HBASE-27773:
---------------------------------
Attachment: Procedures.png
> STUCK Region-In-Transition state
> --------------------------------
>
> Key: HBASE-27773
> URL: https://issues.apache.org/jira/browse/HBASE-27773
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.4.11
> Environment: HBase: 2.4.11
> Hadoop: 3.2.4
> ZooKeeper: 3.7.1
> Reporter: Grigore Lupescu
> Priority: Major
> Attachments: Locks.png, Procedures.png, config.txt
>
>
> One problem we encounter with some regularity is the `STUCK
> Region-In-Transition state=OPENING`.
> We have a three server cluster that runs a full HBASE stack: 3 zookeeper
> nodes, an HBASE master active and standby, 3 region servers, 3 HDFS data
> nodes.
> We've managed to reproduce the stuck region in transition state, by rebooting
> randomly one of the 3 nodes. This is not necessarily the only way it may end
> up in this state, rather a deterministic way we managed to reproduce it to a
> certain extent. Also (a) writing data to hbase while the node reboot happens
> increases the chances of the stuck state being reached as well as (b) if the
> rebooted node is also the active hbasemaster.
> Sample logs:
>
> {code:java}
> [7745.457s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause)
> 523M->44M(818M) 12.736ms
> [10505.454s][info][gc] GC(13) Pause Young (Normal) (G1 Evacuation Pause)
> 523M->44M(818M) 11.066ms
> 2023-04-03 11:26:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:27:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:27:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:28:53,145 INFO [master/cvp504:16000.Chore.1] master.HMaster:
> Not running balancer (force=false, metaRIT=false) because 2 region(s) in
> transition: [state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=78be037bae2fc201707fa511e90dfbbf, state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3]
> 2023-04-03 11:28:53,168 WARN [master/cvp504:16000.Chore.1]
> janitor.CatalogJanitor:
> unknown_server=cvp503.sjc.aristanetworks.com,16201,1680499899167/aeris_v2,\x09,1680499940070.78be037bae2fc201707fa511e90dfbbf.,
>
> unknown_server=cvp503.sjc.aristanetworks.com,16201,1680499899167/aeris_v2,\x12,1680499940070.b732898573f935b72fb1876c6ff944b3.
> 2023-04-03 11:28:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:28:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:29:53,209 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:29:53,209 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3{code}
>
> The stuck state also gets _fixed_ if we kill the pod with the regionserver
> which has the region with stuck in transition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)