[
https://issues.apache.org/jira/browse/HBASE-21095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592084#comment-16592084
]
stack commented on HBASE-21095:
-------------------------------
Ok. Took me a while to understand what this issue is about. The unit test
helped explain. Thanks.
I just pushed Alan's patch to branch-2.0=>branch-2 but then reverted it. I'll
put it in under a different JIRA. Otherwise it will be hard to track what went
in under this issue.
bq. Let me commit. stack Let's also commit HBASE-20881 to branch-2? So that the
fix here could also go into branch-2.
On the above, ok. We have an outline on how to do rolling upgrade to branch-2.2
so go ahead. The rolling upgrade issue should be blocker on branch-2.2 if not
already.
I am not clear on how far back the master branch that is attached here should
go? And should the [~allan163] patch go on master branch? (I only put it on
branch-2.0=>branch-2 under HBASE-21113).
> The timeout retry logic for several procedures are broken after master
> restarts
> -------------------------------------------------------------------------------
>
> Key: HBASE-21095
> URL: https://issues.apache.org/jira/browse/HBASE-21095
> Project: HBase
> Issue Type: Sub-task
> Components: amv2, proc-v2
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.2
>
> Attachments: HBASE-21095-branch-2.0.patch, HBASE-21095-v1.patch,
> HBASE-21095-v2.patch, HBASE-21095.branch-2.0.001.patch, HBASE-21095.patch
>
>
> For TRSP, and also RTP in branch-2.0 and branch-2.1, if we fail to assign or
> unassign a region, we will set the procedure to WAITING_TIMEOUT state, and
> rely on the ProcedureEvent in RegionStateNode to wake us up later. But after
> restarting, we do not suspend the ProcedureEvent in RSN, and also do not add
> the procedure to the ProcedureEvent's suspending queue, so we will hang there
> forever as no one will wake us up.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)