[ 
https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506890#comment-16506890
 ] 

Duo Zhang commented on HBASE-20706:
-----------------------------------

I guess this is not safe for reopening a region. If a region is in OPENING 
state, then it may have already loaded everything at the RS side, so if we do 
not reopen it then it will miss the new configurations which have just 
modified...

So I think we'd better introduce a new procedure called ReopenRegionProcedure, 
the different between this procedure and MRP is that, it never fails. And in 
the parent issue, we can use something like the openSeqNum to determine whether 
the region has been reopened, and if not, we will schedule a reopen procedure 
for it again.

> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --------------------------------------------------------------
>
>                 Key: HBASE-20706
>                 URL: https://issues.apache.org/jira/browse/HBASE-20706
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Critical
>             Fix For: 3.0.0, 2.1.0, 2.0.1
>
>         Attachments: HBASE-20706.001.branch-2.0.patch
>
>
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>         at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>         at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.<init>(MoveRegionProcedure.java:67)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>         at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>         at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to