[
https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559861#comment-16559861
]
stack edited comment on HBASE-20893 at 7/27/18 9:10 PM:
--------------------------------------------------------
[~allan163] Yeah, support for rollback at the split procedure level was added
but rollback invokes rolling back of subprocedures and this does not work if
subprocedure is unassign/assign. A hack was added to do async assign/unassign
up in split/merge but this runs out-of-band, not as part of the rollback, until
rollback gets more love and finish.
Yeah, unsupported exceptions and CODE-BUG as well as crashed out procedure are
scary.
bq. But. since a exception is thrown, the decrease for stateCount never happen.
Lets fix, in a new issue?
Do you have a problem with this patch? It avoids CODE-BUG and skips use of
rollback with its hack async assign/unassign. It also is less *violent* than
what was here previous just re-running a step rather than flipping to (dodgy)
rollback waiting on new procedure scheduling. Your unit tests now conclude with
successful merges and splits where before they finish at rollback, not with
successful split/merge procedure completion.
I'm in here because my long-running tests are failing and I thought this the
cause...(Now I don't think it is but we should clean up the mess it makes).
Thanks.
was (Author: stack):
[~allan163] Yeah, support for rollback at the split procedure level was added
but rollback invokes rolling back of subprocedures and this does not work if
subprocedure is unassign/assign. A hack was added to do async assign/unassign
which run out-of-band as part of rollback until rollback got love and finish.
Unsupported exceptions and CODE-BUG as well as crashed out procedure are scary.
bq. But. since a exception is thrown, the decrease for stateCount never happen.
Lets fix, in a new issue?
Do you have a problem with this patch? It avoids CODE-BUG and skips use of
rollback with the hack async assign/unassign. It also is less *violent* than
what was here previous just re-running a step rather than flipping to (dodgy)
rollback waiting on new procedure scheduling.
I'm in here because my long-running tests are failing and this looks to be the
cause.
> Data loss if splitting region while ServerCrashProcedure executing
> ------------------------------------------------------------------
>
> Key: HBASE-20893
> URL: https://issues.apache.org/jira/browse/HBASE-20893
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 3.0.0, 2.1.0, 2.0.1
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-20893-branch-2.0.addendum.patch,
> HBASE-20893.branch-2.0.001.patch, HBASE-20893.branch-2.0.002.patch,
> HBASE-20893.branch-2.0.003.patch, HBASE-20893.branch-2.0.004.patch,
> HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)