stack created HBASE-20152:
Summary: [AMv2] DisableTableProcedure versus ServerCrashProcedure
Issue Type: Bug
Seeing a small spate of issues where disabled tables/regions are being
assigned. Usually they happen when a DisableTableProcedure is running
concurrent with a ServerCrashProcedure. See below. See associated HBASE-20131.
This is umbrella issue for fixing.
>From HBASE-20137, 'TestRSGroups is Flakey',
* SCP is running because a server was aborted in test.
* SCP starts AssignProcedure of region X from crashed server.
* DisableTable Procedure runs because test has finished and we're doing table
* UnassignProcedure for region X.
* Disable Unassign gets Lock on region X first.
* SCP AssignProcedure tries to get lock, waits on lock.
* DisableTable Procedure UnassignProcedure RPC fails because server is down
(Thats why the SCP).
* Tries to expire the server it failed the RPC against. Fails (currently being
* DisableTable Procedure Unassign is suspended. It is a suspend with lock on
region X held
* SCP can't run because lock on X is held
* Test timesout.
.h2 Delete of online Regions
Saw this in nightly failure #452 for branch-2 in
* DisableTableProcedure is queued before SCP.
* DisableTableProcedure Unassign fails because can't RPC to crashed server and
* Unassign is Stuck in suspend.
* SCP runs and cleans up suspended Disable Unassign.
* SCP completes which includes assign of Disable Unassign region.
* Disable Unassign completes
* Disable completes.
* A scheduled Drop Table Procedure runs (its end of test).
* Succeeds deleting regions that are actually assigned (see above where SCP
This message was sent by Atlassian JIRA