Michael Stack created HBASE-23904:
-------------------------------------
Summary: Procedure updating meta and Master shutdown are
incompatible: CODE-BUG
Key: HBASE-23904
URL: https://issues.apache.org/jira/browse/HBASE-23904
Project: HBase
Issue Type: Bug
Components: amv2
Reporter: Michael Stack
Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a failure
because
{code:java}
2020-02-27 00:57:51,702 ERROR [PEWorker-6] procedure2.ProcedureExecutor(1688):
CODE-BUG: Uncaught runtime exception: pid=14,
state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true;
MergeTableRegionsProcedure table=test,
regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c],
force=false
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.FutureTask@28b956c7 rejected from
java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0,
active threads = 0, queued tasks = 0, completed tasks = 5]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974)
at
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953)
at
org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771)
at
org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637)
at
org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268)
at
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854)
at
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687)
at
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229)
at
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
at
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986)
{code}
A few seconds above, as part of the test, we'd stopped Master
{code:java}
2020-02-27 00:57:51,620 INFO [Time-limited test]
regionserver.HRegionServer(2212): ***** STOPPING region server
'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' *****
2020-02-27 00:57:51,620 INFO [Time-limited test]
regionserver.HRegionServer(2226): STOPPED: Stopping master 0 {code}
The rejected execution damages the merge procedure. It shows as an unhandled
CODE-BUG.
Why we let a runtime exception out when trying to update meta is mildly
interesting. We use Throwables.propagateIfPossible(e,
IOException.{color:#000080}class{color}) from guava which at first blush would
seem to throw the exception if it an IOE else return. In code, if return, we'll
wrap whatever makes it through with an IOE. But propagateIfPossible is a
little sneaky in that if the passed Exception is a RuntimeException, as the
Reject is, it will go ahead and throw and NOT return. Not sure if this was
authors' understanding ([~zhangduo] ? HBASE-21789 for hbase-2.2.0). Looking at
the old code, which called makeIOExceptionOfException from ProtobufUtil, if I
read it right, this would wrap the exception in an IOE regardless whether a
RuntimeException or not.
A little digging exposes that likely root of the problem is that the Master is
stopping. Its connection, which is used by the merge procedure when updating
meta, is being shutdown too. The rejected exception is probably because the
pool has been shutdown. Hard to tell for sure as Master doesn't log the minutae
of services closed.
The propagateIfPossible facility is used in a few places. Its addition to
MetaTableAccessor is in one place only by HBASE-21789. I could restore the old
behavior easy enough (Was afraid we had to deal with this issue around ALL meta
table accesses via MTA).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)