[ 
https://issues.apache.org/jira/browse/HBASE-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787254#comment-17787254
 ] 

Duo Zhang edited comment on HBASE-28210 at 11/17/23 3:09 PM:
-------------------------------------------------------------

Checked the log again, I think this is a more general problem...

If a procedure scheduled a bunch of sub procedures, the sub procedures could be 
executed concurrently, so the order when they are added to the rollback step, 
can be different with the order when they are later persistent. This may cause 
a problem that, a procedure with greater stack id has been persistent 
successfully, while a procedure with less stack id has not been persistent 
successfully. So when loading procedures, there could be holes in the stack id 
and casue trouble.


was (Author: apache9):
Checked the log again, I think this is a more general problem...

If a procedure scheduled a bunch of sub procedures, the sub procedures could be 
executed concurrently, so the order when they are added to the rollback step, 
can be different with the order when they are later persistent. This may cause 
a problem that, a procedure with greater stack id has been persistent 
successfully, while a procedure with less stack id has not been persistent 
successfully. So when loading procedures, there could be wholes in the stack id 
and casue trouble.

> Should not add procedure to rollback step when it is suspended
> --------------------------------------------------------------
>
>                 Key: HBASE-28210
>                 URL: https://issues.apache.org/jira/browse/HBASE-28210
>             Project: HBase
>          Issue Type: Bug
>          Components: master, proc-v2
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Blocker
>             Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 2.5.7
>
>
> Found this when implementing HBASE-28199, as after HBASE-28199 we will 
> suspend procedures a lot, so a missed scenario has been covered and it will 
> fail some UTs with corrupted procedures when loading.
> I think this issue should be fixed separately as it affects all active 
> branches.
> Let me try to implement a UT first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to