[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Soldatov updated HBASE-20657: ------------------------------------ Summary: Retrying RPC call for ModifyTableProcedure may get stuck (was: Retrying RPC call for ModifyTableProcedure may stuck) > Retrying RPC call for ModifyTableProcedure may get stuck > -------------------------------------------------------- > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 > Affects Versions: 2.0.0 > Reporter: Sergey Soldatov > Priority: Major > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)