[jira] [Commented] (HBASE-21095) The timeout retry logic for several procedures are broken after master restarts

stack (JIRA) Fri, 24 Aug 2018 12:54:09 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-21095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592108#comment-16592108
 ]


stack commented on HBASE-21095:
-------------------------------

On your patch [~Apache9], I see how it integrates the [~allan163] patch. Looks 
reasonable. +1 to commit. For branch-2, it'll fail after HBASE-21113... but 
maybe you can massage it in.

> The timeout retry logic for several procedures are broken after master 
> restarts
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-21095
>                 URL: https://issues.apache.org/jira/browse/HBASE-21095
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2, proc-v2
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 3.0.0, 2.2.0
>
>         Attachments: HBASE-21095-branch-2.0.patch, HBASE-21095-v1.patch, 
> HBASE-21095-v2.patch, HBASE-21095.branch-2.0.001.patch, HBASE-21095.patch
>
>
> For TRSP, and also RTP in branch-2.0 and branch-2.1, if we fail to assign or 
> unassign a region, we will set the procedure to WAITING_TIMEOUT state, and 
> rely on the ProcedureEvent in RegionStateNode to wake us up later. But after 
> restarting, we do not suspend the ProcedureEvent in RSN, and also do not add 
> the procedure to the ProcedureEvent's suspending queue, so we will hang there 
> forever as no one will wake us up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21095) The timeout retry logic for several procedures are broken after master restarts

Reply via email to