oktay tuncay created HBASE-22657:
------------------------------------

             Summary: HBase : STUCK Region-In-Transition 
                 Key: HBASE-22657
                 URL: https://issues.apache.org/jira/browse/HBASE-22657
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.0.0
            Reporter: oktay tuncay


When we check the number of regions in transition on Ambari, It shows 1 
transition is waiting. (It's more than 1 in other cluster)

And also, when check the table with command "hbase hbck -details *table_name*" 
status looks INCONSISTENT

_There are 0 overlap groups with 0 overlapping regions
ERROR: Found inconsistency in table *Table_Name*
Summary:
Table hbase:meta is okay.
Number of regions: 1
Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
Table *Table_Name *is okay.
Number of regions: 39
Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
2 inconsistencies detected.
Status: *INCONSISTENT*

When I check the logfiles, I saw following warning messages,

2019-06-09T07:14:15.179+02:00 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK 
Region-In-Transition rit=CLOSING, location=*hostname*,*port*,1558699727048, 
table=*table_name*, region=c67dd5d8bcd174cc2001695c31475ab1

According this message, region c67dd5d8bcd174cc2001695c31475ab1 try to assign 
*host* but this operation is stuck.

We stopped RS process on *host* and force assign to another RS which are 
running.

*hbase(main):001:0> assign 'c67dd5d8bcd174cc2001695c31475ab1'*

After that operaion, INCONSISTENT has gone and we re-started RS on host.
One of the reasons why a region gets stuck in transition is because, when it is 
being moved across regionservers, it is unassigned from the source regionserver 
but is never assigned to another regionserver

I think Below code is responsible for that process. 

private void handleRegionOverStuckWarningThreshold(final RegionInfo regionInfo) 
{
final RegionStateNode regionNode = regionStates.getRegionStateNode(regionInfo);
//if (regionNode.isStuck()) {
LOG.warn("STUCK Region-In-Transition {}", regionNode);_

It seems one potential way of unstuck the region is to send close request to 
the region server. May be blocked because another Procedure holds the exclusive 
lock and is not letting go.

My question is what is the root cause for this problem and I think, HBase 
should be able to fix region-In-Transition issue.
We can fix this problem by manual but some customer does not have this 
knowledge and I think HBase needs to be recover itself.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to