Michael Stack created HBASE-23904:
-------------------------------------

             Summary: Procedure updating meta and Master shutdown are 
incompatible: CODE-BUG
                 Key: HBASE-23904
                 URL: https://issues.apache.org/jira/browse/HBASE-23904
             Project: HBase
          Issue Type: Bug
          Components: amv2
            Reporter: Michael Stack


Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a failure 
because
{code:java}
2020-02-27 00:57:51,702 ERROR [PEWorker-6] procedure2.ProcedureExecutor(1688): 
CODE-BUG: Uncaught runtime exception: pid=14, 
state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true; 
MergeTableRegionsProcedure table=test, 
regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c], 
force=false
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.FutureTask@28b956c7 rejected from 
java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0, 
active threads = 0, queued tasks = 0, completed tasks = 5]
        at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
        at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
        at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
        at 
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974)
        at 
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953)
        at 
org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771)
        at 
org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854)
        at 
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687)
        at 
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229)
        at 
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986)
 {code}
A few seconds above, as part of the test, we'd stopped Master
{code:java}

2020-02-27 00:57:51,620 INFO  [Time-limited test] 
regionserver.HRegionServer(2212): ***** STOPPING region server 
'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' *****
2020-02-27 00:57:51,620 INFO  [Time-limited test] 
regionserver.HRegionServer(2226): STOPPED: Stopping master 0 {code}
The rejected execution damages the merge procedure. It shows as an unhandled 
CODE-BUG.

Why we let a runtime exception out when trying to update meta is mildly 
interesting. We use Throwables.propagateIfPossible(e, 
IOException.{color:#000080}class{color}) from guava which at first blush would 
seem to throw the exception if it an IOE else return. In code, if return, we'll 
wrap whatever makes it through with an IOE.  But propagateIfPossible is a 
little sneaky in that if the passed Exception is a RuntimeException, as the 
Reject is, it will go ahead and throw and NOT return.  Not sure if this was 
authors' understanding ([~zhangduo]  ? HBASE-21789 for hbase-2.2.0). Looking at 
the old code, which called makeIOExceptionOfException from ProtobufUtil, if I 
read it right, this would wrap the exception in an IOE regardless whether a 
RuntimeException or not.

A little digging exposes that likely root of the problem is that the Master is 
stopping. Its connection, which is used by the merge procedure when updating 
meta, is being shutdown too. The rejected exception is probably because the 
pool has been shutdown. Hard to tell for sure as Master doesn't log the minutae 
of services closed.

The propagateIfPossible facility is used in a few places. Its addition to 
MetaTableAccessor is in one place only by HBASE-21789. I could restore the old 
behavior easy enough (Was afraid we had to deal with this issue around ALL meta 
table accesses via MTA).

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to