[
https://issues.apache.org/jira/browse/HBASE-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Stack resolved HBASE-23904.
-----------------------------------
Fix Version/s: 2.3.0
3.0.0
Resolution: Fixed
Pushed to branch-2 and master.
> Procedure updating meta and Master shutdown are incompatible: CODE-BUG
> ----------------------------------------------------------------------
>
> Key: HBASE-23904
> URL: https://issues.apache.org/jira/browse/HBASE-23904
> Project: HBase
> Issue Type: Bug
> Components: amv2
> Reporter: Michael Stack
> Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a
> failure because
> {code:java}
> 2020-02-27 00:57:51,702 ERROR [PEWorker-6]
> procedure2.ProcedureExecutor(1688): CODE-BUG: Uncaught runtime exception:
> pid=14, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true;
> MergeTableRegionsProcedure table=test,
> regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c],
> force=false
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.FutureTask@28b956c7 rejected from
> java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0,
> active threads = 0, queued tasks = 0, completed tasks = 5]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> at
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974)
> at
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771)
> at
> org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854)
> at
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687)
> at
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229)
> at
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986)
> {code}
> A few seconds above, as part of the test, we'd stopped Master
> {code:java}
> 2020-02-27 00:57:51,620 INFO [Time-limited test]
> regionserver.HRegionServer(2212): ***** STOPPING region server
> 'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' *****
> 2020-02-27 00:57:51,620 INFO [Time-limited test]
> regionserver.HRegionServer(2226): STOPPED: Stopping master 0 {code}
> The rejected execution damages the merge procedure. It shows as an unhandled
> CODE-BUG.
> Why we let a runtime exception out when trying to update meta is mildly
> interesting. We use Throwables.propagateIfPossible(e,
> IOException.{color:#000080}class{color}) from guava which at first blush
> would seem to throw the exception if it an IOE else return. In code, if
> return, we'll wrap whatever makes it through with an IOE. But
> propagateIfPossible is a little sneaky in that if the passed Exception is a
> RuntimeException, as the Reject is, it will go ahead and throw and NOT
> return. Not sure if this was authors' understanding ([~zhangduo] ?
> HBASE-21789 for hbase-2.2.0). Looking at the old code, which called
> makeIOExceptionOfException from ProtobufUtil, if I read it right, this would
> wrap the exception in an IOE regardless whether a RuntimeException or not.
> A little digging exposes that likely root of the problem is that the Master
> is stopping. Its connection, which is used by the merge procedure when
> updating meta, is being shutdown too. The rejected exception is probably
> because the pool has been shutdown. Hard to tell for sure as Master doesn't
> log the minutae of services closed.
> The propagateIfPossible facility is used in a few places. Its addition to
> MetaTableAccessor is in one place only by HBASE-21789. I could restore the
> old behavior easy enough (Was afraid we had to deal with this issue around
> ALL meta table accesses via MTA).
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)