Josh Elser created HBASE-20706:
----------------------------------
Summary: [hack] Don't add known not-OPEN regions in reopen phase
of MTP
Key: HBASE-20706
URL: https://issues.apache.org/jira/browse/HBASE-20706
Project: HBase
Issue Type: Sub-task
Components: amv2
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 3.0.0, 2.1.0, 2.0.1
Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper"
fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the
wrong construct, we would want something specific to reopen (e.g. a
ReopenProcedure).
However, we're in a really bad state right now. If there are non-open regions
for a table which has a modify submitted against it, the entire system locks up
in a fast-spin while holding the table's lock. This fills up HDFS with PV2
wals, and prevents you from doing anything in the hbase shell to try to fix
those unassigned regions. You'll see spam in the master log like:
{noformat}
2018-06-07 03:21:29,448 WARN [PEWorker-1] procedure.ModifyTableProcedure:
Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in
state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
org.apache.hadoop.hbase.client.DoNotRetryRegionException:
a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
at
org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
at
org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.<init>(MoveRegionProcedure.java:67)
at
org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
at
org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
at
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
at
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
at
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
{noformat}
We unstuck out internal test cluster giving the following change on top of
Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a
table's regions to only be those which are currently OPEN. There may be some
transient failures here as well, but a subsequent retry of the reopen step
should filter out that change.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)