[ 
https://issues.apache.org/jira/browse/HBASE-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046182#comment-17046182
 ] 

Duo Zhang commented on HBASE-23904:
-----------------------------------

The idea is that, a RuntimeException usually means we have code bug. So here if 
we do think the rejection should be handled, then we can catch it explicitly?

> Procedure updating meta and Master shutdown are incompatible: CODE-BUG
> ----------------------------------------------------------------------
>
>                 Key: HBASE-23904
>                 URL: https://issues.apache.org/jira/browse/HBASE-23904
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>            Reporter: Michael Stack
>            Priority: Major
>
> Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a 
> failure because
> {code:java}
> 2020-02-27 00:57:51,702 ERROR [PEWorker-6] 
> procedure2.ProcedureExecutor(1688): CODE-BUG: Uncaught runtime exception: 
> pid=14, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true; 
> MergeTableRegionsProcedure table=test, 
> regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c], 
> force=false
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.FutureTask@28b956c7 rejected from 
> java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0, 
> active threads = 0, queued tasks = 0, completed tasks = 5]
>         at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
>         at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
>         at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
>         at 
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974)
>         at 
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953)
>         at 
> org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771)
>         at 
> org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854)
>         at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687)
>         at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229)
>         at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77)
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986)
>  {code}
> A few seconds above, as part of the test, we'd stopped Master
> {code:java}
> 2020-02-27 00:57:51,620 INFO  [Time-limited test] 
> regionserver.HRegionServer(2212): ***** STOPPING region server 
> 'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' *****
> 2020-02-27 00:57:51,620 INFO  [Time-limited test] 
> regionserver.HRegionServer(2226): STOPPED: Stopping master 0 {code}
> The rejected execution damages the merge procedure. It shows as an unhandled 
> CODE-BUG.
> Why we let a runtime exception out when trying to update meta is mildly 
> interesting. We use Throwables.propagateIfPossible(e, 
> IOException.{color:#000080}class{color}) from guava which at first blush 
> would seem to throw the exception if it an IOE else return. In code, if 
> return, we'll wrap whatever makes it through with an IOE.  But 
> propagateIfPossible is a little sneaky in that if the passed Exception is a 
> RuntimeException, as the Reject is, it will go ahead and throw and NOT 
> return.  Not sure if this was authors' understanding ([~zhangduo]  ? 
> HBASE-21789 for hbase-2.2.0). Looking at the old code, which called 
> makeIOExceptionOfException from ProtobufUtil, if I read it right, this would 
> wrap the exception in an IOE regardless whether a RuntimeException or not.
> A little digging exposes that likely root of the problem is that the Master 
> is stopping. Its connection, which is used by the merge procedure when 
> updating meta, is being shutdown too. The rejected exception is probably 
> because the pool has been shutdown. Hard to tell for sure as Master doesn't 
> log the minutae of services closed.
> The propagateIfPossible facility is used in a few places. Its addition to 
> MetaTableAccessor is in one place only by HBASE-21789. I could restore the 
> old behavior easy enough (Was afraid we had to deal with this issue around 
> ALL meta table accesses via MTA).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to