[ https://issues.apache.org/jira/browse/HBASE-29376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
guluo resolved HBASE-29376. --------------------------- Fix Version/s: 3.0.0-beta-2 Resolution: Fixed > ReplicationLogCleaner.preClean/getDeletableFiles should return early when > asyncClusterConnection closes during HMaster stopping > ------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-29376 > URL: https://issues.apache.org/jira/browse/HBASE-29376 > Project: HBase > Issue Type: Improvement > Components: master, Replication > Environment: HBase master > Reporter: guluo > Assignee: guluo > Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > When HMaster is stopping, I found that hbase printed a lot of exception logs > (hbase.master.cleaner.interval = 10000(ms) or you can configure a smaller > time interval ), as follow. > 2025-06-04T20:49:37,614 ERROR [master/hbase001:16000.Chore.2] > master.ReplicationLogCleaner: Error occurred while executing > queueStorage.hasData() > org.apache.hadoop.hbase.replication.ReplicationException: failed to get > replication queue table > at > org.apache.hadoop.hbase.replication.TableReplicationQueueStorage.hasData(TableReplicationQueueStorage.java:538) > ~[hbase-replication-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.preClean(ReplicationLogCleaner.java:86) > ~[hbase-server-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?] > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.preRunCleaner(CleanerChore.java:282) > ~[hbase-server-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:257) > ~[hbase-server-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:161) > ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > ~[?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > ~[?:?] > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107) > ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > ~[?:?] > at java.lang.Thread.run(Thread.java:833) ~[?:?] > Caused by: org.apache.hadoop.hbase.ipc.StoppedRpcClientException: Call to > address=hbase001:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.StoppedRpcClientException > at java.lang.Thread.getStackTrace(Thread.java:1610) ~[?:?] > at > org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:144) > ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:163) > ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:186) > ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.client.AdminOverAsyncAdmin.tableExists(AdminOverAsyncAdmin.java:130) > ~[hbase-client-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.replication.TableReplicationQueueStorage.hasData(TableReplicationQueueStorage.java:536) > ~[hbase-replication-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.preClean(ReplicationLogCleaner.java:86) > ~[hbase-server-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?] > > The reason. > When the HMaster service enters its stopping phase, the ReplicationLogCleaner > task continues to execute periodically. During these executions, it invokes > the rpm.getQueueStorage().hasData() method to check for the existence of > pending data in the replication queue. > However, once the HMaster service closes its asyncClusterConnection, we can > no longer properly retrieve replication queue data because the underlying RPC > client has been shut down at that point. > So I think we should check if HMaster.asyncClusterConnection is closed in > ReplicationLogCleaner to ensure a graceful shutdown of hmaster -- This message was sent by Atlassian Jira (v8.20.10#820010)