[ https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510143#comment-16510143 ]
Josh Elser commented on HBASE-20706: ------------------------------------ {quote}So, this patch has MTP look for OPEN regions only. Duo's point is that a region whose state is OPENING when MTP runs will soon be in the OPEN state; it'll have missed the edits MTP did. Need to add OPENING to MTP at least. {quote} Got it. Thanks for the explanation. I understand where you're coming from now. Sorry for not getting it the first time, both of you. I'll make that mod with the UT fixes as well. > [hack] Don't add known not-OPEN regions in reopen phase of MTP > -------------------------------------------------------------- > > Key: HBASE-20706 > URL: https://issues.apache.org/jira/browse/HBASE-20706 > Project: HBase > Issue Type: Sub-task > Components: amv2 > Reporter: Josh Elser > Assignee: Josh Elser > Priority: Critical > Fix For: 3.0.0, 2.1.0, 2.0.1 > > Attachments: HBASE-20706.001.branch-2.0.patch > > > Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" > fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the > wrong construct, we would want something specific to reopen (e.g. a > ReopenProcedure). > However, we're in a really bad state right now. If there are non-open regions > for a table which has a modify submitted against it, the entire system locks > up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 > wals, and prevents you from doing anything in the hbase shell to try to fix > those unassigned regions. You'll see spam in the master log like: > {noformat} > 2018-06-07 03:21:29,448 WARN [PEWorker-1] procedure.ModifyTableProcedure: > Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in > state=MODIFY_TABLE_REOPEN_ALL_REGIONS) > org.apache.hadoop.hbase.client.DoNotRetryRegionException: > a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN > at > org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193) > at > org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.<init>(MoveRegionProcedure.java:67) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760) > {noformat} > We unstuck out internal test cluster giving the following change on top of > Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a > table's regions to only be those which are currently OPEN. There may be some > transient failures here as well, but a subsequent retry of the reopen step > should filter out that change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)