Duo Zhang created HBASE-29802:
---------------------------------

             Summary: NPE when shutting down mini cluster cause tests hang
                 Key: HBASE-29802
                 URL: https://issues.apache.org/jira/browse/HBASE-29802
             Project: HBase
          Issue Type: Bug
          Components: mapreduce, test
            Reporter: Duo Zhang


Found this in a timeout run for TestSecureExportSnapshot

{noformat}
2026-01-02T22:53:58,988 INFO  [RegionServerTracker-0] master.HMaster(3360): 
***** STOPPING master '59fad5da8b71,38027,1767393931926' *****
2026-01-02T22:53:58,988 INFO  [RegionServerTracker-0] master.HMaster(3362): 
STOPPED: Cluster shutdown set; onlineServer=0
2026-01-02T22:53:58,988 INFO  [M:0;59fad5da8b71:38027] 
hbase.HBaseServerBase(457): Close async cluster connection
2026-01-02T22:53:58,988 INFO  [M:0;59fad5da8b71:38027] 
hbase.HBaseServerBase(440): Shutdown chores and chore service
2026-01-02T22:53:58,988 DEBUG [M:0;59fad5da8b71:38027] cleaner.LogCleaner(212): 
Cancelling LogCleaner
2026-01-02T22:53:58,989 DEBUG [M:0;59fad5da8b71:38027] 
cleaner.HFileCleaner(335): Stopping file delete threads
2026-01-02T22:53:58,989 WARN  [OldWALsCleaner-0] cleaner.LogCleaner(200): 
Interrupted while cleaning old WALs, will try to clean it next round. Exiting.
2026-01-02T22:53:58,989 DEBUG 
[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.large.0-1767393935168] 
cleaner.HFileCleaner(306): Exit 
Thread[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.large.0-1767393935168,5,FailOnTimeoutGroup]
2026-01-02T22:53:58,989 DEBUG 
[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.small.0-1767393935174] 
cleaner.HFileCleaner(306): Exit 
Thread[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.small.0-1767393935174,5,FailOnTimeoutGroup]
2026-01-02T22:53:58,989 INFO  [M:0;59fad5da8b71:38027] hbase.ChoreService(370): 
Chore service for: master/59fad5da8b71:0 had [ScheduledChore 
name=FlushedSequenceIdFlusher, period=10800000, unit=MILLISECONDS] on shutdown
2026-01-02T22:53:58,989 INFO  [M:0;59fad5da8b71:38027] 
hbase.HBaseServerBase(450): Shutdown executor service
2026-01-02T22:53:58,989 DEBUG [M:0;59fad5da8b71:38027] master.HMaster(1813): 
Stopping service threads
2026-01-02T22:53:58,989 INFO  [M:0;59fad5da8b71:38027] 
procedure2.RemoteProcedureDispatcher(119): Stopping procedure remote dispatcher
2026-01-02T22:53:58,989 INFO  [M:0;59fad5da8b71:38027] 
procedure2.ProcedureExecutor(723): Stopping
2026-01-02T22:53:58,990 INFO  [M:0;59fad5da8b71:38027] 
region.RegionProcedureStore(110): Stopping the Region Procedure Store, 
isAbort=false
2026-01-02T22:53:58,991 DEBUG [Time-limited test-EventThread] 
zookeeper.ZKWatcher(609): master:38027-0x100734ba5eb0000, 
quorum=127.0.0.1:59100, baseZNode=/hbase Received ZooKeeper Event, 
type=NodeDeleted, state=SyncConnected, path=/hbase/master
2026-01-02T22:53:58,992 DEBUG [Time-limited test-EventThread] 
zookeeper.ZKWatcher(609): master:38027-0x100734ba5eb0000, 
quorum=127.0.0.1:59100, baseZNode=/hbase Received ZooKeeper Event, 
type=NodeChildrenChanged, state=SyncConnected, path=/hbase
2026-01-02T22:53:58,992 DEBUG [M:0;59fad5da8b71:38027] zookeeper.ZKUtil(347): 
master:38027-0x100734ba5eb0000, quorum=127.0.0.1:59100, baseZNode=/hbase Unable 
to get data of znode /hbase/master because node does not exist (not an error)
2026-01-02T22:53:58,992 WARN  [M:0;59fad5da8b71:38027] 
master.ActiveMasterManager(344): Failed get of master address: 
java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2026-01-02T22:53:58,993 INFO  [M:0;59fad5da8b71:38027] 
master.ServerManager(1148): Writing .lastflushedseqids file at: 
hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/.lastflushedseqids
2026-01-02T22:53:58,994 DEBUG [normalizer-worker-0] 
normalizer.RegionNormalizerWorker(193): interrupt detected. terminating.
2026-01-02T22:53:59,019 INFO  [Block report processor] 
blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:37189 is 
added to blk_1073742422_1598 (size=307)
2026-01-02T22:53:59,019 INFO  [Block report processor] 
blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:38631 is 
added to blk_1073742422_1598 (size=307)
2026-01-02T22:53:59,020 INFO  [Block report processor] 
blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:40115 is 
added to blk_1073742422_1598 (size=307)
2026-01-02T22:53:59,020 INFO  [M:0;59fad5da8b71:38027] 
assignment.AssignmentManager(409): Stopping assignment manager
2026-01-02T22:53:59,021 INFO  [M:0;59fad5da8b71:38027] 
region.MasterRegion(264): Closing local region {ENCODED => 
1595e783b53d99cd5eef43b6debb2682, NAME => 
'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY => 
''}, isAbort=false
2026-01-02T22:53:59,021 DEBUG [M:0;59fad5da8b71:38027] 
regionserver.HRegion(1725): Closing 1595e783b53d99cd5eef43b6debb2682, disabling 
compactions & flushes
2026-01-02T22:53:59,042 INFO  [M:0;59fad5da8b71:38027] 
regionserver.HRegion(1758): Closing region 
master:store,,1.1595e783b53d99cd5eef43b6debb2682.
2026-01-02T22:53:59,042 DEBUG [M:0;59fad5da8b71:38027] 
regionserver.HRegion(1779): Time limited wait for close lock on 
master:store,,1.1595e783b53d99cd5eef43b6debb2682.
2026-01-02T22:53:59,042 DEBUG [M:0;59fad5da8b71:38027] 
regionserver.HRegion(1846): Acquired close lock on 
master:store,,1.1595e783b53d99cd5eef43b6debb2682. after waiting 0 ms
2026-01-02T22:53:59,043 DEBUG [M:0;59fad5da8b71:38027] 
regionserver.HRegion(1856): Updates disabled for region 
master:store,,1.1595e783b53d99cd5eef43b6debb2682.
2026-01-02T22:53:59,043 INFO  [M:0;59fad5da8b71:38027] 
regionserver.HRegion(2969): Flushing 1595e783b53d99cd5eef43b6debb2682 4/4 
column families, dataSize=1013.53 KB heapSize=1.19 MB
2026-01-02T22:53:59,043 ERROR 
[AsyncFSWAL-0-hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/MasterData-prefix:59fad5da8b71,38027,1767393931926]
 server.NIOServerCnxnFactory(85): Thread 
Thread[AsyncFSWAL-0-hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/MasterData-prefix:59fad5da8b71,38027,1767393931926,5,FailOnTimeoutGroup]
 died
java.lang.NullPointerException: Cannot invoke 
"org.apache.hbase.thirdparty.io.netty.buffer.ByteBuf.readableBytes()" because 
"this.buf" is null
        at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.buffered(FanOutOneBlockAsyncDFSOutput.java:414)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.append(AsyncProtobufLogWriter.java:134)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doAppend(AsyncFSWAL.java:181)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doAppend(AsyncFSWAL.java:100)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.appendEntry(AbstractFSWAL.java:1333)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.appendAndSync(AbstractFSWAL.java:1724)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.consume(AbstractFSWAL.java:1832)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to