[ 
https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878993#comment-16878993
 ] 

Xiaolin Ha commented on HBASE-20728:
------------------------------------

I can reproduce this error, by steps:
 # add more than one servers to a rsgroup,
 # move table to this rsgroup,
 # move all table regions to one server of this rsgroup (this is important, to 
make definitely region's 'lastHost' in rsgroup, or maybe in other group),
 # stop all the region servers in this rsgroup (better wait a while),
 # restart servers in this rsgroup,
 # rit stuck appears, and rs name in the {{RIT}} message has the old timestamp, 
logs like:  WARN [ProcExecTimeout] assignment.AssignmentManager(1328): STUCK 
Region-In-Transition rit=OPEN, location=localhost,32843,1562307050191, 
table=Group_testKillAllRSInGroupAndThenAddNew, 
region=a763499801435d2f78ab42876c6cb3ec
 # if change step 5 by add a new server to this rsgroup, the RIT message in 
step 6 should has old rs info.

ROOT cause of this problem is the same as HBASE-20368. We discussed at: 
https://github.com/apache/hbase/pull/354

 

 

 

> Failure and recovery of all RSes in a RSgroup requires master restart for 
> region assignments
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20728
>                 URL: https://issues.apache.org/jira/browse/HBASE-20728
>             Project: HBase
>          Issue Type: Bug
>          Components: master, rsgroup
>            Reporter: Biju Nair
>            Assignee: Sakthi
>            Priority: Minor
>
> If all the RSes in a RSgroup hosting user tables fail and recover, master 
> still looks for old RSes (with old timestamp in the RS identifier) to assign 
> regions. i.e. Regions are left in transition making the tables in the RSGroup 
> unavailable. User need to restart {{master}} or manually assign the regions 
> to make the tables available. Steps to recreate the scenario in a local 
> cluster
>  - Add required properties to {{site.xml}} to enable {{rsgroup}} and start 
> hbase
>  - Bring up multiple region servers using {{local-regionservers.sh start}}
>  - Create a {{rsgroup}} and move a subset of  {{regionservers}} to the group
>  - Create a table, move it to the group and put some data
>  - Stop the {{regionservers}} in the group and restart them
>  - From the {{master UI}}, we can see that the region for the table in 
> transition and the RS name in the {{RIT}} message has the old timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to