stack created HBASE-20152:

             Summary: [AMv2] DisableTableProcedure versus ServerCrashProcedure
                 Key: HBASE-20152
             Project: HBase
          Issue Type: Bug
          Components: amv2
            Reporter: stack
            Assignee: stack

Seeing a small spate of issues where disabled tables/regions are being 
assigned. Usually they happen when a DisableTableProcedure is running 
concurrent with a ServerCrashProcedure. See below. See associated HBASE-20131. 
This is umbrella issue for fixing.

.h2 Deadlock
>From HBASE-20137, 'TestRSGroups is Flakey', 

 * SCP is running because a server was aborted in test.
 * SCP starts AssignProcedure of region X from crashed server.
 * DisableTable Procedure runs because test has finished and we're doing table 
delete. Queues 
 * UnassignProcedure for region X.
 * Disable Unassign gets Lock on region X first.
 * SCP AssignProcedure tries to get lock, waits on lock.
 * DisableTable Procedure UnassignProcedure RPC fails because server is down 
(Thats why the SCP).
 * Tries to expire the server it failed the RPC against. Fails (currently being 
 * DisableTable Procedure Unassign is suspended. It is a suspend with lock on 
region X held
 * SCP can't run because lock on X is held
 * Test timesout.

.h2 Delete of online Regions
Saw this in nightly failure #452 for branch-2 in

 * DisableTableProcedure is queued before SCP.
 * DisableTableProcedure Unassign fails because can't RPC to crashed server and 
can't expire.
 * Unassign is Stuck in suspend.
 * SCP runs and cleans up suspended Disable Unassign.
 * SCP completes which includes assign of Disable Unassign region.
 * Disable Unassign completes
 * Disable completes.
 * A scheduled Drop Table Procedure runs (its end of test).
 * Succeeds deleting regions that are actually assigned (see above where SCP 
assigned region).

This message was sent by Atlassian JIRA

Reply via email to