[jira] [Commented] (HBASE-28522) UNASSIGN proc indefinitely stuck on dead rs

Viraj Jasani (Jira) Wed, 08 May 2024 21:06:05 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-28522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844849#comment-17844849
 ]


Viraj Jasani commented on HBASE-28522:
--------------------------------------

{code:java}
if (
  
env.getMasterServices().getTableStateManager().isTableState(regionNode.getTable(),
    TableState.State.DISABLING)
) {
  // We need to change the state here otherwise the TRSP scheduled by DTP will 
try to
  // close the region from a dead server and will never succeed. Please see 
HBASE-23636
  // for more details.
  env.getAssignmentManager().regionClosedAbnormally(regionNode);
  LOG.info("{} found table disabling for region {}, set it state to 
ABNORMALLY_CLOSED.",
    this, regionNode);
  continue;
} {code}
[~zhangduo] why don't we mark the region state to CLOSED here, given that DTP 
is a special case where TRSP is used to only unassign and close the regions?

> UNASSIGN proc indefinitely stuck on dead rs
> -------------------------------------------
>
>                 Key: HBASE-28522
>                 URL: https://issues.apache.org/jira/browse/HBASE-28522
>             Project: HBase
>          Issue Type: Improvement
>          Components: proc-v2
>            Reporter: Prathyusha
>            Assignee: Prathyusha
>            Priority: Minor
>
> One scenario we noticed in production -
> we had DisableTableProc and SCP almost triggered at similar time
> 2024-03-16 17:59:23,014 INFO [PEWorker-11] procedure.DisableTableProcedure - 
> Set <TABLE_NAME> to state=DISABLING
> 2024-03-16 17:59:15,243 INFO [PEWorker-26] procedure.ServerCrashProcedure - 
> Start pid=21592440, state=RUNNABLE:SERVER_CRASH_START, locked=true; 
> ServerCrashProcedure 
> <regionserver>, splitWal=true, meta=false
> DisabeTableProc creates unassign procs, and at this time ASSIGNs of SCP is 
> not completed
> {{2024-03-16 17:59:23,003 DEBUG [PEWorker-40] procedure2.ProcedureExecutor - 
> LOCK_EVENT_WAIT pid=21594220, ppid=21592440, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=<TABLE_NAME>, region=<regionhash>, ASSIGN}}
> UNASSIGN created by DisableTableProc is stuck on the dead regionserver and we 
> had to manually bypass unassign of DisableTableProc and then do ASSIGN.
> If we can break the loop for UNASSIGN procedure to not retry if there is scp 
> for that server, we do not need manual intervention?, at least the 
> DisableTableProc can go to a rollback state?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28522) UNASSIGN proc indefinitely stuck on dead rs

Reply via email to