[
https://issues.apache.org/jira/browse/HBASE-27773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083736#comment-18083736
]
Umesh Kumar Kumawat commented on HBASE-27773:
---------------------------------------------
In such case there are 3 possibility,
# Master is not able to communicate with RS (Search RS start code in master
logs to check any error while connection, if yes it is the case)
# Region server is taking too much time to open the region. (Server region
hash on the target RS logs)
# Region is not able to report back to master (If you see Open is completed on
RS check logs of connection to master on RS logs).
If someone is facing the same issue in future please find the logs and attach.
> STUCK Region-In-Transition state
> --------------------------------
>
> Key: HBASE-27773
> URL: https://issues.apache.org/jira/browse/HBASE-27773
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.4.11
> Environment: HBase: 2.4.11
> Hadoop: 3.2.4
> ZooKeeper: 3.7.1
> Reporter: Grigore Lupescu
> Priority: Major
> Attachments: Locks.png, Procedures.png, config.txt
>
>
> One problem we encounter with some regularity is the `STUCK
> Region-In-Transition state=OPENING`.
> We have a three server cluster that runs a full HBASE stack: 3 zookeeper
> nodes, an HBASE master active and standby, 3 region servers, 3 HDFS data
> nodes.
> We've managed to reproduce the stuck region in transition state, by rebooting
> randomly one of the 3 nodes. This is not necessarily the only way it may end
> up in this state, rather a deterministic way we managed to reproduce it to a
> certain extent. Also (a) writing data to hbase while the node reboot happens
> increases the chances of the stuck state being reached as well as (b) if the
> rebooted node is also the active hbasemaster.
> Sample logs:
>
> {code:java}
> [7745.457s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause)
> 523M->44M(818M) 12.736ms
> [10505.454s][info][gc] GC(13) Pause Young (Normal) (G1 Evacuation Pause)
> 523M->44M(818M) 11.066ms
> 2023-04-03 11:26:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:27:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:27:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:28:53,145 INFO [master/cvp504:16000.Chore.1] master.HMaster:
> Not running balancer (force=false, metaRIT=false) because 2 region(s) in
> transition: [state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=78be037bae2fc201707fa511e90dfbbf, state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3]
> 2023-04-03 11:28:53,168 WARN [master/cvp504:16000.Chore.1]
> janitor.CatalogJanitor:
> unknown_server=cvp503.sjc.aristanetworks.com,16201,1680499899167/aeris_v2,\x09,1680499940070.78be037bae2fc201707fa511e90dfbbf.,
>
> unknown_server=cvp503.sjc.aristanetworks.com,16201,1680499899167/aeris_v2,\x12,1680499940070.b732898573f935b72fb1876c6ff944b3.
> 2023-04-03 11:28:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:28:53,208 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3
> 2023-04-03 11:29:53,209 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=78be037bae2fc201707fa511e90dfbbf
> 2023-04-03 11:29:53,209 WARN [ProcExecTimeout] assignment.AssignmentManager:
> STUCK Region-In-Transition state=OPENING,
> location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2,
> region=b732898573f935b72fb1876c6ff944b3{code}
>
> The stuck state also gets _fixed_ if we kill the pod with the regionserver
> which has the region with stuck in transition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)