[ https://issues.apache.org/jira/browse/HBASE-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391969#comment-16391969 ]
Mike Drob commented on HBASE-20152: ----------------------------------- bq. Move does its Assign... Region is onlined before WAL splitting completes. right, would need to tie this somehow to the completion of SCP. Hmm > [AMv2] DisableTableProcedure versus ServerCrashProcedure > -------------------------------------------------------- > > Key: HBASE-20152 > URL: https://issues.apache.org/jira/browse/HBASE-20152 > Project: HBase > Issue Type: Bug > Components: amv2 > Reporter: stack > Assignee: stack > Priority: Major > > Seeing a small spate of issues where disabled tables/regions are being > assigned. Usually they happen when a DisableTableProcedure is running > concurrent with a ServerCrashProcedure. See below. See associated > HBASE-20131. This is umbrella issue for fixing. > h3. Deadlock > From HBASE-20137, 'TestRSGroups is Flakey', > https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325 > {code} > * SCP is running because a server was aborted in test. > * SCP starts AssignProcedure of region X from crashed server. > * DisableTable Procedure runs because test has finished and we're doing > table delete. Queues > * UnassignProcedure for region X. > * Disable Unassign gets Lock on region X first. > * SCP AssignProcedure tries to get lock, waits on lock. > * DisableTable Procedure UnassignProcedure RPC fails because server is down > (Thats why the SCP). > * Tries to expire the server it failed the RPC against. Fails (currently > being SCP'd). > * DisableTable Procedure Unassign is suspended. It is a suspend with lock on > region X held > * SCP can't run because lock on X is held > * Test timesout. > {code} > h3. Delete of online Regions > Saw this in nightly failure #452 for branch-2 in > TestSplitTransactionOnCluster.org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster > {code} > * DisableTableProcedure is queued before SCP. > * DisableTableProcedure Unassign fails because can't RPC to crashed server > and can't expire. > * Unassign is Stuck in suspend. > * SCP runs and cleans up suspended Disable Unassign. > * SCP completes which includes assign of Disable Unassign region. > * Disable Unassign completes > * Disable completes. > * A scheduled Drop Table Procedure runs (its end of test). > * Succeeds deleting regions that are actually assigned (see above where SCP > assigned region). > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)