Josh Elser commented on HBASE-20706:

bq. Oh, yeah, after fixing unit tests

Yeah, of course. Trying to free myself to do that today

bq. you also need to include the region in OPENING state

I'm still not getting the "why". I equate a region in OPENING to already have a 
Procedure running which is doing something to that region? Thus, if there is 
such a procedure, wouldn't that procedure be holding an appropriate lock (on 
table or region) which would preclude this MTP from having run in the first 

bq. we also need to check other states to see if there are some states which 
the region may have already been initialized at RS side.

I'm sorry, but I just don't understand what to take from this. Are you 
suggesting that the Master is not the source of truth and that the Master would 
have to reach out to a RegionServer to figure out something about a Region is 
wants to Move (or reopen as the context may be)? Or are you just thinking 
out-loud about what the pitfalls would be if we tried to do stuff with OPENING 
regions that are in flight (back to my confusion on why to include OPENING in 
the first place)?

Thanks in advance.

> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --------------------------------------------------------------
>                 Key: HBASE-20706
>                 URL: https://issues.apache.org/jira/browse/HBASE-20706
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Critical
>             Fix For: 3.0.0, 2.1.0, 2.0.1
>         Attachments: HBASE-20706.001.branch-2.0.patch
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>         at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>         at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.<init>(MoveRegionProcedure.java:67)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>         at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>         at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.

This message was sent by Atlassian JIRA

Reply via email to