[
https://issues.apache.org/jira/browse/HBASE-26866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-26866.
-------------------------------
Fix Version/s: 3.0.0-alpha-3
Hadoop Flags: Reviewed
Resolution: Fixed
Merged to master.
Thanks [~Xiaolin Ha] for reviewing.
> Shutdown WAL may abort region server
> ------------------------------------
>
> Key: HBASE-26866
> URL: https://issues.apache.org/jira/browse/HBASE-26866
> Project: HBase
> Issue Type: Bug
> Components: wal
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
> Fix For: 3.0.0-alpha-3
>
>
> https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3140/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestSyncReplicationActive-output.txt
> TestSyncReplicationAcive is flaky because of we may abort the region server
> when shutting down WAL.
> {noformat}
> 2022-03-18T04:50:37,205 WARN
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=36877]
> master.MasterRpcServices(682): jenkins-hbase13.apache.org,33377,1647579008859
> reported a fatal error:
> ***** ABORTING region server jenkins-hbase13.apache.org,33377,1647579008859:
> Log rolling failed *****
> Cause:
> java.util.concurrent.RejectedExecutionException: Task
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$681/1458648270@37209753
> rejected from java.util.concurrent.ThreadPoolExecutor@69662eb7[Shutting
> down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks =
> 0]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.cleanOldLogs(AbstractFSWAL.java:773)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:935)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$8(AbstractFSWAL.java:953)
> at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:196)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:953)
> at
> org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:316)
> at
> org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:214)
> {noformat}
> The problem here is that, the removal of WAL is async, when shuttting down
> the WAL, we will close the thread pool so it will throw reject execution
> exception and cause region server abort.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)