[ 
https://issues.apache.org/jira/browse/HBASE-20893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559861#comment-16559861
 ] 

stack edited comment on HBASE-20893 at 7/27/18 9:10 PM:
--------------------------------------------------------

[~allan163] Yeah, support for rollback at the split procedure level was added 
but rollback invokes rolling back of subprocedures and this does not work if 
subprocedure is unassign/assign. A hack was added to do async assign/unassign 
up in split/merge but this runs out-of-band, not as part of the rollback, until 
rollback gets more love and finish.

Yeah, unsupported exceptions and CODE-BUG as well as crashed out procedure are 
scary.

bq. But. since a exception is thrown, the decrease for stateCount never happen.

Lets fix, in a new issue?

Do you have a problem with this patch? It avoids CODE-BUG and skips use of 
rollback with its hack async assign/unassign. It also is less *violent* than 
what was here previous just re-running a step rather than flipping to (dodgy) 
rollback waiting on new procedure scheduling. Your unit tests now conclude with 
successful merges and splits where before they finish at rollback, not with 
successful split/merge procedure completion.

I'm in here because my long-running tests are failing and I thought this the 
cause...(Now I don't think it is but we should clean up the mess it makes).

Thanks.




was (Author: stack):
[~allan163] Yeah, support for rollback at the split procedure level was added 
but rollback invokes rolling back of subprocedures and this does not work if 
subprocedure is unassign/assign. A hack was added to do async assign/unassign 
which run out-of-band as part of rollback until rollback got love and finish.

Unsupported exceptions and CODE-BUG as well as crashed out procedure are scary.

bq. But. since a exception is thrown, the decrease for stateCount never happen.

Lets fix, in a new issue?

Do you have a problem with this patch? It avoids CODE-BUG and skips use of 
rollback with the hack async assign/unassign. It also is less *violent* than 
what was here previous just re-running a step rather than flipping to (dodgy) 
rollback waiting on new procedure scheduling. 

I'm in here because my long-running tests are failing and this looks to be the 
cause.



> Data loss if splitting region while ServerCrashProcedure executing
> ------------------------------------------------------------------
>
>                 Key: HBASE-20893
>                 URL: https://issues.apache.org/jira/browse/HBASE-20893
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0, 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
>         Attachments: HBASE-20893-branch-2.0.addendum.patch, 
> HBASE-20893.branch-2.0.001.patch, HBASE-20893.branch-2.0.002.patch, 
> HBASE-20893.branch-2.0.003.patch, HBASE-20893.branch-2.0.004.patch, 
> HBASE-20893.branch-2.0.005.patch
>
>
> Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to