[ 
https://issues.apache.org/jira/browse/HBASE-21095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21095:
------------------------------
    Description: For TRSP, and also RTP in branch-2.0 and branch-2.1, if we 
fail to assign or unassign a region, we will set the procedure to 
WAITING_TIMEOUT state, and rely on the ProcedureEvent in RegionStateNode to 
wake us up later. But after restarting, we do not suspend the ProcedureEvent in 
RSN, and also do not add the procedure to the ProcedureEvent's suspending 
queue, so we will hang there forever as no one will wake us up.  (was: It also 
uses TRSP as sub procedure so probably we should set killIfHasParent to true, 
but the log is a bit interesting, that we just hang there without executing any 
procedures after a restart, but for other tests where we need to set 
killIfHasParent to true, we will keep executing procedures but do not make any 
progress.

Need to dig more.)

> The timeout retry logic for several procedures are broken after master 
> restarts
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-21095
>                 URL: https://issues.apache.org/jira/browse/HBASE-21095
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2, proc-v2
>            Reporter: Duo Zhang
>            Priority: Major
>             Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
>
> For TRSP, and also RTP in branch-2.0 and branch-2.1, if we fail to assign or 
> unassign a region, we will set the procedure to WAITING_TIMEOUT state, and 
> rely on the ProcedureEvent in RegionStateNode to wake us up later. But after 
> restarting, we do not suspend the ProcedureEvent in RSN, and also do not add 
> the procedure to the ProcedureEvent's suspending queue, so we will hang there 
> forever as no one will wake us up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to