[
https://issues.apache.org/jira/browse/HBASE-20368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878999#comment-16878999
]
Xiaolin Ha commented on HBASE-20368:
------------------------------------
See this region stuck log: WARN [ProcExecTimeout]
assignment.AssignmentManager(1328): STUCK Region-In-Transition rit=OPEN,
location=localhost,32843,1562307050191,
table=Group_testKillAllRSInGroupAndThenAddNew,
region=a763499801435d2f78ab42876c6cb3ec
I think region state 'OPEN' may be error and confusing? When SCP starts and
creates TRSP, should these new TRSPs also call serverCrashed() to set region
state to 'ABNORMALLY_CLOSED'? Any concerns if assign region begins at state
'ABNORMALLY_CLOSED'? [~zghaobac],[~Apache9]
Relevant codes in SCP:
{quote}private void assignRegions(MasterProcedureEnv env, List<RegionInfo>
regions) throws IOException {
AssignmentManager am = env.getMasterServices().getAssignmentManager();
for (RegionInfo region : regions) {
RegionStateNode regionNode =
am.getRegionStates().getOrCreateRegionStateNode(region);
regionNode.lock();
try {
if (regionNode.getProcedure() != null) {
LOG.info("{} found RIT {}; {}", this, regionNode.getProcedure(), regionNode);
regionNode.getProcedure().serverCrashed(env, regionNode, getServerName());
} else {
if
(env.getMasterServices().getTableStateManager().isTableState(regionNode.getTable(),
TableState.State.DISABLING, TableState.State.DISABLED)) {
continue;
}
TransitRegionStateProcedure proc = TransitRegionStateProcedure.assign(env,
region, null);
regionNode.setProcedure(proc);
addChildProcedure(proc);
}
} finally {
regionNode.unlock();
}
}
}{quote}
> Fix RIT stuck when a rsgroup has no online servers but AM's
> pendingAssginQueue is cleared
> -----------------------------------------------------------------------------------------
>
> Key: HBASE-20368
> URL: https://issues.apache.org/jira/browse/HBASE-20368
> Project: HBase
> Issue Type: Bug
> Components: rsgroup
> Affects Versions: 2.0.0
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
> Fix For: 2.0.6, 2.1.6
>
> Attachments: HBASE-20368.branch-2.001.patch,
> HBASE-20368.branch-2.002.patch, HBASE-20368.branch-2.003.patch,
> HBASE-20368.branch-2.003.patch, HBASE-20368.branch-2.003.patch,
> HBASE-20368.branch-2.1.001.patch
>
>
> This error can be reproduced by shutting down all servers in a rsgroups and
> starting them soon afterwards.
> The regions on this rsgroup will be reassigned, but there is no available
> servers of this rsgroup.
> They will be added to AM's pendingAssginQueue, which AM will clear regardless
> of the result of assigning in this case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)