[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502662#comment-16502662
 ] 

stack commented on HBASE-20657:
-------------------------------

Thanks [~sergey.soldatov] (and @ankit singhai) for the digging.

>From the description {{.... When we are processing 
>MODIFY_TABLE_REOPEN_ALL_REGIONS we create MoveRegionProcedure for each region, 
>but it checks whether the region is online (and it's not),,,,,.}} ...

So, we can't "move" a region that is offline. That seems reasonable.  And if 
you didn't close a region, you shouldn't be doing its reopen.... So, MP failing 
seems right? Need to have MODIFY_TABLE_REOPEN_ALL_REGIONS deal with this? Or I 
should make a reopen procedure... one that gets lock on region and does 
reopen... would remove a layer of procedures too... not making use of MTP.

bq. Another question is related to (1) from the description. Is it expected, 
that during the retry from client we generate a new nonce key for the same 
procedure?

Thats broken. I believe [[email protected]] found this... Lets fix.

Let me assign myself for now...  If you don't mind [~sergey.soldatov]

> Retrying RPC call for ModifyTableProcedure may get stuck
> --------------------------------------------------------
>
>                 Key: HBASE-20657
>                 URL: https://issues.apache.org/jira/browse/HBASE-20657
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, proc-v2
>    Affects Versions: 2.0.0
>            Reporter: Sergey Soldatov
>            Assignee: Sergey Soldatov
>            Priority: Major
>             Fix For: 2.0.1
>
>         Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>          ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
>             td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [[email protected]] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to