oktay tuncay created HBASE-22657:
------------------------------------
Summary: HBase : STUCK Region-In-Transition
Key: HBASE-22657
URL: https://issues.apache.org/jira/browse/HBASE-22657
Project: HBase
Issue Type: Bug
Affects Versions: 2.0.0
Reporter: oktay tuncay
When we check the number of regions in transition on Ambari, It shows 1
transition is waiting. (It's more than 1 in other cluster)
And also, when check the table with command "hbase hbck -details *table_name*"
status looks INCONSISTENT
_There are 0 overlap groups with 0 overlapping regions
ERROR: Found inconsistency in table *Table_Name*
Summary:
Table hbase:meta is okay.
Number of regions: 1
Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
Table *Table_Name *is okay.
Number of regions: 39
Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
2 inconsistencies detected.
Status: *INCONSISTENT*
When I check the logfiles, I saw following warning messages,
2019-06-09T07:14:15.179+02:00 WARN
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
Region-In-Transition rit=CLOSING, location=*hostname*,*port*,1558699727048,
table=*table_name*, region=c67dd5d8bcd174cc2001695c31475ab1
According this message, region c67dd5d8bcd174cc2001695c31475ab1 try to assign
*host* but this operation is stuck.
We stopped RS process on *host* and force assign to another RS which are
running.
*hbase(main):001:0> assign 'c67dd5d8bcd174cc2001695c31475ab1'*
After that operaion, INCONSISTENT has gone and we re-started RS on host.
One of the reasons why a region gets stuck in transition is because, when it is
being moved across regionservers, it is unassigned from the source regionserver
but is never assigned to another regionserver
I think Below code is responsible for that process.
private void handleRegionOverStuckWarningThreshold(final RegionInfo regionInfo)
{
final RegionStateNode regionNode = regionStates.getRegionStateNode(regionInfo);
//if (regionNode.isStuck()) {
LOG.warn("STUCK Region-In-Transition {}", regionNode);_
It seems one potential way of unstuck the region is to send close request to
the region server. May be blocked because another Procedure holds the exclusive
lock and is not letting go.
My question is what is the root cause for this problem and I think, HBase
should be able to fix region-In-Transition issue.
We can fix this problem by manual but some customer does not have this
knowledge and I think HBase needs to be recover itself.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)