[
https://issues.apache.org/jira/browse/HBASE-21095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang updated HBASE-21095:
------------------------------
Description: For TRSP, and also RTP in branch-2.0 and branch-2.1, if we
fail to assign or unassign a region, we will set the procedure to
WAITING_TIMEOUT state, and rely on the ProcedureEvent in RegionStateNode to
wake us up later. But after restarting, we do not suspend the ProcedureEvent in
RSN, and also do not add the procedure to the ProcedureEvent's suspending
queue, so we will hang there forever as no one will wake us up. (was: It also
uses TRSP as sub procedure so probably we should set killIfHasParent to true,
but the log is a bit interesting, that we just hang there without executing any
procedures after a restart, but for other tests where we need to set
killIfHasParent to true, we will keep executing procedures but do not make any
progress.
Need to dig more.)
> The timeout retry logic for several procedures are broken after master
> restarts
> -------------------------------------------------------------------------------
>
> Key: HBASE-21095
> URL: https://issues.apache.org/jira/browse/HBASE-21095
> Project: HBase
> Issue Type: Bug
> Components: amv2, proc-v2
> Reporter: Duo Zhang
> Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
>
> For TRSP, and also RTP in branch-2.0 and branch-2.1, if we fail to assign or
> unassign a region, we will set the procedure to WAITING_TIMEOUT state, and
> rely on the ProcedureEvent in RegionStateNode to wake us up later. But after
> restarting, we do not suspend the ProcedureEvent in RSN, and also do not add
> the procedure to the ProcedureEvent's suspending queue, so we will hang there
> forever as no one will wake us up.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)