[
https://issues.apache.org/jira/browse/HBASE-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787254#comment-17787254
]
Duo Zhang edited comment on HBASE-28210 at 11/17/23 3:09 PM:
-------------------------------------------------------------
Checked the log again, I think this is a more general problem...
If a procedure scheduled a bunch of sub procedures, the sub procedures could be
executed concurrently, so the order when they are added to the rollback step,
can be different with the order when they are later persistent. This may cause
a problem that, a procedure with greater stack id has been persistent
successfully, while a procedure with less stack id has not been persistent
successfully. So when loading procedures, there could be holes in the stack id
and casue trouble.
was (Author: apache9):
Checked the log again, I think this is a more general problem...
If a procedure scheduled a bunch of sub procedures, the sub procedures could be
executed concurrently, so the order when they are added to the rollback step,
can be different with the order when they are later persistent. This may cause
a problem that, a procedure with greater stack id has been persistent
successfully, while a procedure with less stack id has not been persistent
successfully. So when loading procedures, there could be wholes in the stack id
and casue trouble.
> Should not add procedure to rollback step when it is suspended
> --------------------------------------------------------------
>
> Key: HBASE-28210
> URL: https://issues.apache.org/jira/browse/HBASE-28210
> Project: HBase
> Issue Type: Bug
> Components: master, proc-v2
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Blocker
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 2.5.7
>
>
> Found this when implementing HBASE-28199, as after HBASE-28199 we will
> suspend procedures a lot, so a missed scenario has been covered and it will
> fail some UTs with corrupted procedures when loading.
> I think this issue should be fixed separately as it affects all active
> branches.
> Let me try to implement a UT first.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)