[
https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-20173:
--------------------------
Description:
See 'Deadlock' scenario in parent issue. Doing as focused subtask since parent
has a few things going on in it.
Let me reproduce it below:
>From HBASE-20137, 'TestRSGroups is Flakey',
>https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
* SCP is running because a server was aborted in test.
* SCP starts AssignProcedure of region X from crashed server.
* DisableTable Procedure runs because test has finished and we're doing table
delete. Queues
* UnassignProcedure for region X.
* Disable Unassign gets Lock on region X first.
* SCP AssignProcedure tries to get lock, waits on lock.
* DisableTable Procedure UnassignProcedure RPC fails because server is down
(Thats why the SCP).
* Tries to expire the server it failed the RPC against. Fails (currently being
SCP'd).
* DisableTable Procedure Unassign is suspended. It is a suspend with lock on
region X held
* SCP can't run because lock on X is held
* Test timesout.
was:See 'Deadlock' scenario in parent issue. Doing as focused subtask since
parent has a few things going on in it.
> [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
> ----------------------------------------------------------------------------
>
> Key: HBASE-20173
> URL: https://issues.apache.org/jira/browse/HBASE-20173
> Project: HBase
> Issue Type: Sub-task
> Components: amv2
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20173.branch-2.001.patch,
> HBASE-20173.branch-2.002.patch
>
>
> See 'Deadlock' scenario in parent issue. Doing as focused subtask since
> parent has a few things going on in it.
> Let me reproduce it below:
> From HBASE-20137, 'TestRSGroups is Flakey',
> https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
> * SCP is running because a server was aborted in test.
> * SCP starts AssignProcedure of region X from crashed server.
> * DisableTable Procedure runs because test has finished and we're doing
> table delete. Queues
> * UnassignProcedure for region X.
> * Disable Unassign gets Lock on region X first.
> * SCP AssignProcedure tries to get lock, waits on lock.
> * DisableTable Procedure UnassignProcedure RPC fails because server is down
> (Thats why the SCP).
> * Tries to expire the server it failed the RPC against. Fails (currently
> being SCP'd).
> * DisableTable Procedure Unassign is suspended. It is a suspend with lock on
> region X held
> * SCP can't run because lock on X is held
> * Test timesout.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)