[
https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878993#comment-16878993
]
Xiaolin Ha commented on HBASE-20728:
------------------------------------
I can reproduce this error, by steps:
# add more than one servers to a rsgroup,
# move table to this rsgroup,
# move all table regions to one server of this rsgroup (this is important, to
make definitely region's 'lastHost' in rsgroup, or maybe in other group),
# stop all the region servers in this rsgroup (better wait a while),
# restart servers in this rsgroup,
# rit stuck appears, and rs name in the {{RIT}} message has the old timestamp,
logs like: WARN [ProcExecTimeout] assignment.AssignmentManager(1328): STUCK
Region-In-Transition rit=OPEN, location=localhost,32843,1562307050191,
table=Group_testKillAllRSInGroupAndThenAddNew,
region=a763499801435d2f78ab42876c6cb3ec
# if change step 5 by add a new server to this rsgroup, the RIT message in
step 6 should has old rs info.
ROOT cause of this problem is the same as HBASE-20368. We discussed at:
https://github.com/apache/hbase/pull/354
> Failure and recovery of all RSes in a RSgroup requires master restart for
> region assignments
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-20728
> URL: https://issues.apache.org/jira/browse/HBASE-20728
> Project: HBase
> Issue Type: Bug
> Components: master, rsgroup
> Reporter: Biju Nair
> Assignee: Sakthi
> Priority: Minor
>
> If all the RSes in a RSgroup hosting user tables fail and recover, master
> still looks for old RSes (with old timestamp in the RS identifier) to assign
> regions. i.e. Regions are left in transition making the tables in the RSGroup
> unavailable. User need to restart {{master}} or manually assign the regions
> to make the tables available. Steps to recreate the scenario in a local
> cluster
> - Add required properties to {{site.xml}} to enable {{rsgroup}} and start
> hbase
> - Bring up multiple region servers using {{local-regionservers.sh start}}
> - Create a {{rsgroup}} and move a subset of {{regionservers}} to the group
> - Create a table, move it to the group and put some data
> - Stop the {{regionservers}} in the group and restart them
> - From the {{master UI}}, we can see that the region for the table in
> transition and the RS name in the {{RIT}} message has the old timestamp.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)