[
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16563103#comment-16563103
]
Guanghao Zhang commented on HBASE-20657:
----------------------------------------
bq. I don't know a reason why this shouldn't also be applied to 2.x
[~elserj] I opened a issue HBASE-20713 about this. The soluation in master
branch is not the final soluation. So I don't applied it to 2.* branch.
> Retrying RPC call for ModifyTableProcedure may get stuck
> --------------------------------------------------------
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
> Issue Type: Bug
> Components: Client, proc-v2
> Affects Versions: 2.0.0
> Reporter: Sergey Soldatov
> Assignee: stack
> Priority: Critical
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-20657-1-branch-2.patch,
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch,
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS.
> Steps to reproduce: Active master is killed while ModifyTableProcedure is
> executed.
> If the table has enough regions it may come that when the secondary master
> get active some of the regions may be closed, so once client retries the call
> to the new active master, a new ModifyTableProcedure is created and get stuck
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which
> create a procedure with a new nonce key:
> {noformat}
> ModifyTableRequest request =
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
> So on the server side, it's considered as a new procedure and starts
> executing immediately.
> 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create
> MoveRegionProcedure for each region, but it checks whether the region is
> online (and it's not), so it fails immediately, forcing the procedure to
> restart.
> [[email protected]] saw a similar case when two concurrent ModifyTable
> procedures were running and got stuck in the similar way.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)