[ 
https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510120#comment-16510120
 ] 

stack commented on HBASE-20706:
-------------------------------

bq. I'm still not getting the "why". I equate a region in OPENING....

[~Apache9] has a point.

The AssignProcedure works by setting the region state to OPENING in hbase:meta 
in the info:state column before it rpc's the RS to ask it open the region. 
After dispatching the RPC, the Procedure then suspends and remains suspended 
until it gets notice from the RS that the open has succeeded.

So, this patch has MTP look for OPEN regions only. Duo's point is that a region 
whose state is OPENING when MTP runs will soon be in the OPEN state; it'll have 
missed the edits MTP did. Need to add OPENING to MTP at least.

Newly made split daughters will be in SPLITTING_NEW but they are offline at 
this stage. Subsequently they are assigned (SPLITTING_NEW=>OPENING => OPEN). 
Merging regions are unassigned first, the merged region is created and added to 
hbase:meta iwth MERGED state. It is then assigned (MERGED=>OPENING=>OPEN).

So, adding OPENING as state that needs to be reopened makes sense.

Working on the ReopenRegionProcedure. Its taking a while.


> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --------------------------------------------------------------
>
>                 Key: HBASE-20706
>                 URL: https://issues.apache.org/jira/browse/HBASE-20706
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Critical
>             Fix For: 3.0.0, 2.1.0, 2.0.1
>
>         Attachments: HBASE-20706.001.branch-2.0.patch
>
>
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>         at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>         at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.<init>(MoveRegionProcedure.java:67)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>         at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>         at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to