Duo Zhang created HBASE-29802:
---------------------------------
Summary: NPE when shutting down mini cluster cause tests hang
Key: HBASE-29802
URL: https://issues.apache.org/jira/browse/HBASE-29802
Project: HBase
Issue Type: Bug
Components: mapreduce, test
Reporter: Duo Zhang
Found this in a timeout run for TestSecureExportSnapshot
{noformat}
2026-01-02T22:53:58,988 INFO [RegionServerTracker-0] master.HMaster(3360):
***** STOPPING master '59fad5da8b71,38027,1767393931926' *****
2026-01-02T22:53:58,988 INFO [RegionServerTracker-0] master.HMaster(3362):
STOPPED: Cluster shutdown set; onlineServer=0
2026-01-02T22:53:58,988 INFO [M:0;59fad5da8b71:38027]
hbase.HBaseServerBase(457): Close async cluster connection
2026-01-02T22:53:58,988 INFO [M:0;59fad5da8b71:38027]
hbase.HBaseServerBase(440): Shutdown chores and chore service
2026-01-02T22:53:58,988 DEBUG [M:0;59fad5da8b71:38027] cleaner.LogCleaner(212):
Cancelling LogCleaner
2026-01-02T22:53:58,989 DEBUG [M:0;59fad5da8b71:38027]
cleaner.HFileCleaner(335): Stopping file delete threads
2026-01-02T22:53:58,989 WARN [OldWALsCleaner-0] cleaner.LogCleaner(200):
Interrupted while cleaning old WALs, will try to clean it next round. Exiting.
2026-01-02T22:53:58,989 DEBUG
[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.large.0-1767393935168]
cleaner.HFileCleaner(306): Exit
Thread[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.large.0-1767393935168,5,FailOnTimeoutGroup]
2026-01-02T22:53:58,989 DEBUG
[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.small.0-1767393935174]
cleaner.HFileCleaner(306): Exit
Thread[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.small.0-1767393935174,5,FailOnTimeoutGroup]
2026-01-02T22:53:58,989 INFO [M:0;59fad5da8b71:38027] hbase.ChoreService(370):
Chore service for: master/59fad5da8b71:0 had [ScheduledChore
name=FlushedSequenceIdFlusher, period=10800000, unit=MILLISECONDS] on shutdown
2026-01-02T22:53:58,989 INFO [M:0;59fad5da8b71:38027]
hbase.HBaseServerBase(450): Shutdown executor service
2026-01-02T22:53:58,989 DEBUG [M:0;59fad5da8b71:38027] master.HMaster(1813):
Stopping service threads
2026-01-02T22:53:58,989 INFO [M:0;59fad5da8b71:38027]
procedure2.RemoteProcedureDispatcher(119): Stopping procedure remote dispatcher
2026-01-02T22:53:58,989 INFO [M:0;59fad5da8b71:38027]
procedure2.ProcedureExecutor(723): Stopping
2026-01-02T22:53:58,990 INFO [M:0;59fad5da8b71:38027]
region.RegionProcedureStore(110): Stopping the Region Procedure Store,
isAbort=false
2026-01-02T22:53:58,991 DEBUG [Time-limited test-EventThread]
zookeeper.ZKWatcher(609): master:38027-0x100734ba5eb0000,
quorum=127.0.0.1:59100, baseZNode=/hbase Received ZooKeeper Event,
type=NodeDeleted, state=SyncConnected, path=/hbase/master
2026-01-02T22:53:58,992 DEBUG [Time-limited test-EventThread]
zookeeper.ZKWatcher(609): master:38027-0x100734ba5eb0000,
quorum=127.0.0.1:59100, baseZNode=/hbase Received ZooKeeper Event,
type=NodeChildrenChanged, state=SyncConnected, path=/hbase
2026-01-02T22:53:58,992 DEBUG [M:0;59fad5da8b71:38027] zookeeper.ZKUtil(347):
master:38027-0x100734ba5eb0000, quorum=127.0.0.1:59100, baseZNode=/hbase Unable
to get data of znode /hbase/master because node does not exist (not an error)
2026-01-02T22:53:58,992 WARN [M:0;59fad5da8b71:38027]
master.ActiveMasterManager(344): Failed get of master address:
java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2026-01-02T22:53:58,993 INFO [M:0;59fad5da8b71:38027]
master.ServerManager(1148): Writing .lastflushedseqids file at:
hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/.lastflushedseqids
2026-01-02T22:53:58,994 DEBUG [normalizer-worker-0]
normalizer.RegionNormalizerWorker(193): interrupt detected. terminating.
2026-01-02T22:53:59,019 INFO [Block report processor]
blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:37189 is
added to blk_1073742422_1598 (size=307)
2026-01-02T22:53:59,019 INFO [Block report processor]
blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:38631 is
added to blk_1073742422_1598 (size=307)
2026-01-02T22:53:59,020 INFO [Block report processor]
blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:40115 is
added to blk_1073742422_1598 (size=307)
2026-01-02T22:53:59,020 INFO [M:0;59fad5da8b71:38027]
assignment.AssignmentManager(409): Stopping assignment manager
2026-01-02T22:53:59,021 INFO [M:0;59fad5da8b71:38027]
region.MasterRegion(264): Closing local region {ENCODED =>
1595e783b53d99cd5eef43b6debb2682, NAME =>
'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY =>
''}, isAbort=false
2026-01-02T22:53:59,021 DEBUG [M:0;59fad5da8b71:38027]
regionserver.HRegion(1725): Closing 1595e783b53d99cd5eef43b6debb2682, disabling
compactions & flushes
2026-01-02T22:53:59,042 INFO [M:0;59fad5da8b71:38027]
regionserver.HRegion(1758): Closing region
master:store,,1.1595e783b53d99cd5eef43b6debb2682.
2026-01-02T22:53:59,042 DEBUG [M:0;59fad5da8b71:38027]
regionserver.HRegion(1779): Time limited wait for close lock on
master:store,,1.1595e783b53d99cd5eef43b6debb2682.
2026-01-02T22:53:59,042 DEBUG [M:0;59fad5da8b71:38027]
regionserver.HRegion(1846): Acquired close lock on
master:store,,1.1595e783b53d99cd5eef43b6debb2682. after waiting 0 ms
2026-01-02T22:53:59,043 DEBUG [M:0;59fad5da8b71:38027]
regionserver.HRegion(1856): Updates disabled for region
master:store,,1.1595e783b53d99cd5eef43b6debb2682.
2026-01-02T22:53:59,043 INFO [M:0;59fad5da8b71:38027]
regionserver.HRegion(2969): Flushing 1595e783b53d99cd5eef43b6debb2682 4/4
column families, dataSize=1013.53 KB heapSize=1.19 MB
2026-01-02T22:53:59,043 ERROR
[AsyncFSWAL-0-hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/MasterData-prefix:59fad5da8b71,38027,1767393931926]
server.NIOServerCnxnFactory(85): Thread
Thread[AsyncFSWAL-0-hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/MasterData-prefix:59fad5da8b71,38027,1767393931926,5,FailOnTimeoutGroup]
died
java.lang.NullPointerException: Cannot invoke
"org.apache.hbase.thirdparty.io.netty.buffer.ByteBuf.readableBytes()" because
"this.buf" is null
at
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.buffered(FanOutOneBlockAsyncDFSOutput.java:414)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.append(AsyncProtobufLogWriter.java:134)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doAppend(AsyncFSWAL.java:181)
at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doAppend(AsyncFSWAL.java:100)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.appendEntry(AbstractFSWAL.java:1333)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.appendAndSync(AbstractFSWAL.java:1724)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.consume(AbstractFSWAL.java:1832)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)