[
https://issues.apache.org/jira/browse/HBASE-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047975#comment-17047975
]
Michael Stack commented on HBASE-23904:
---------------------------------------
I filed HBASE-23910 to consider where we should be wrapping RuntimeExceptions
in IOEs making RPCs.
Will push this patch since it restores the old behavior and makes us at least
consistent around edits to meta by Procedures.
> Procedure updating meta and Master shutdown are incompatible: CODE-BUG
> ----------------------------------------------------------------------
>
> Key: HBASE-23904
> URL: https://issues.apache.org/jira/browse/HBASE-23904
> Project: HBase
> Issue Type: Bug
> Components: amv2
> Reporter: Michael Stack
> Priority: Major
>
> Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a
> failure because
> {code:java}
> 2020-02-27 00:57:51,702 ERROR [PEWorker-6]
> procedure2.ProcedureExecutor(1688): CODE-BUG: Uncaught runtime exception:
> pid=14, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true;
> MergeTableRegionsProcedure table=test,
> regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c],
> force=false
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.FutureTask@28b956c7 rejected from
> java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0,
> active threads = 0, queued tasks = 0, completed tasks = 5]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> at
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974)
> at
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854)
> at
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687)
> at
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229)
> at
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986)
> {code}
> A few seconds above, as part of the test, we'd stopped Master
> {code:java}
> 2020-02-27 00:57:51,620 INFO [Time-limited test]
> regionserver.HRegionServer(2212): ***** STOPPING region server
> 'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' *****
> 2020-02-27 00:57:51,620 INFO [Time-limited test]
> regionserver.HRegionServer(2226): STOPPED: Stopping master 0 {code}
> The rejected execution damages the merge procedure. It shows as an unhandled
> CODE-BUG.
> Why we let a runtime exception out when trying to update meta is mildly
> interesting. We use Throwables.propagateIfPossible(e,
> IOException.{color:#000080}class{color}) from guava which at first blush
> would seem to throw the exception if it an IOE else return. In code, if
> return, we'll wrap whatever makes it through with an IOE. But
> propagateIfPossible is a little sneaky in that if the passed Exception is a
> RuntimeException, as the Reject is, it will go ahead and throw and NOT
> return. Not sure if this was authors' understanding ([~zhangduo] ?
> HBASE-21789 for hbase-2.2.0). Looking at the old code, which called
> makeIOExceptionOfException from ProtobufUtil, if I read it right, this would
> wrap the exception in an IOE regardless whether a RuntimeException or not.
> A little digging exposes that likely root of the problem is that the Master
> is stopping. Its connection, which is used by the merge procedure when
> updating meta, is being shutdown too. The rejected exception is probably
> because the pool has been shutdown. Hard to tell for sure as Master doesn't
> log the minutae of services closed.
> The propagateIfPossible facility is used in a few places. Its addition to
> MetaTableAccessor is in one place only by HBASE-21789. I could restore the
> old behavior easy enough (Was afraid we had to deal with this issue around
> ALL meta table accesses via MTA).
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)