[ 
https://issues.apache.org/jira/browse/HBASE-28522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prathyusha updated HBASE-28522:
-------------------------------
    Description: 
One scenario we noticed in production -

we had DisableTableProc and SCP almost triggered at similar time

2024-03-16 17:59:23,014 INFO [PEWorker-11] procedure.DisableTableProcedure - 
Set <TABLE_NAME> to state=DISABLING

2024-03-16 17:59:15,243 INFO [PEWorker-26] procedure.ServerCrashProcedure - 
Start pid=21592440, state=RUNNABLE:SERVER_CRASH_START, locked=true; 
ServerCrashProcedure 
<regionserver>, splitWal=true, meta=false

DisabeTableProc creates unassign procs, and at this time ASSIGNs of SCP is not 
completed

{{2024-03-16 17:59:23,003 DEBUG [PEWorker-40] procedure2.ProcedureExecutor - 
LOCK_EVENT_WAIT pid=21594220, ppid=21592440, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
TransitRegionStateProcedure table=<TABLE_NAME>, region=<regionhash>, ASSIGN}}

UNASSIGN created by DisableTableProc is stuck on the dead regionserver and we 
had to manually bypass unassign of DisableTableProc and then do ASSIGN.

If we can break the loop for UNASSIGN procedure to not retry if there is scp 
for that server, we do not need manual intervention

  was:
One scenario we noticed in production -

we had DisableTableProc and SCP almost triggered at similar time

2024-03-16 17:59:23,014 INFO [PEWorker-11] procedure.DisableTableProcedure - 
Set <TABLE_NAME> to state=DISABLING

2024-03-16 17:59:15,243 INFO [PEWorker-26] procedure.ServerCrashProcedure - 
Start pid=21592440, state=RUNNABLE:SERVER_CRASH_START, locked=true; 
ServerCrashProcedure 
<regionserver>, splitWal=true, meta=false

DisabeTableProc creates unassign procs, and at this time ASSIGNs of SCP is not 
completed

{{2024-03-16 17:59:23,003 DEBUG [PEWorker-40] procedure2.ProcedureExecutor - 
LOCK_EVENT_WAIT pid=21594220, ppid=21592440, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
TransitRegionStateProcedure table=<TABLE_NAME>, region=<regionhash>, ASSIGN}}

{{UNASSIGN created by DisableTableProc is stuck on the dead regionserver and we 
had to manually bypass unassign of }}{{DisableTableProc}}{{ and then do 
ASSIGN.}}

{{If we can break the loop for UNASSIGN procedure to not retry if there is scp 
for that server, we do not need manual intervention}}


> UNASSIGN proc indefinitely stuck on dead rs
> -------------------------------------------
>
>                 Key: HBASE-28522
>                 URL: https://issues.apache.org/jira/browse/HBASE-28522
>             Project: HBase
>          Issue Type: Improvement
>          Components: proc-v2
>            Reporter: Prathyusha
>            Priority: Minor
>
> One scenario we noticed in production -
> we had DisableTableProc and SCP almost triggered at similar time
> 2024-03-16 17:59:23,014 INFO [PEWorker-11] procedure.DisableTableProcedure - 
> Set <TABLE_NAME> to state=DISABLING
> 2024-03-16 17:59:15,243 INFO [PEWorker-26] procedure.ServerCrashProcedure - 
> Start pid=21592440, state=RUNNABLE:SERVER_CRASH_START, locked=true; 
> ServerCrashProcedure 
> <regionserver>, splitWal=true, meta=false
> DisabeTableProc creates unassign procs, and at this time ASSIGNs of SCP is 
> not completed
> {{2024-03-16 17:59:23,003 DEBUG [PEWorker-40] procedure2.ProcedureExecutor - 
> LOCK_EVENT_WAIT pid=21594220, ppid=21592440, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=<TABLE_NAME>, region=<regionhash>, ASSIGN}}
> UNASSIGN created by DisableTableProc is stuck on the dead regionserver and we 
> had to manually bypass unassign of DisableTableProc and then do ASSIGN.
> If we can break the loop for UNASSIGN procedure to not retry if there is scp 
> for that server, we do not need manual intervention



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to