GitHub user otmanel31 added a comment to the discussion: [question] pulsar broker reboot
1) Also, two days ago, we also face an other issue (that trigger a shutdown of our deployement), where brokers timeout on zookeeper call. Before first exception, i catched a lot of logs (Info) in few seconds like below for each topics in all brokers: - 09:49:31.186 [ForkJoinPool.commonPool-worker-1] INFO org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling for persistent://my_tenant/my_ns/action_down-f640fa62-3245-41af-81f5-edf868118a9a 09:49:31.186 [ForkJoinPool.commonPool-worker-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://my_tenant/my_ns/action_down-f640fa62-3245-41af-81f5-edf868118a9a] Policies updated successfully For your information, we manage more than 25000 topics, so we have this 2 lines for each active topic. Is there any request to zookeeper when this 2 previous log lines appear ? 2) Then first exception thrown is: 09:50:06.167 [pulsar-ordered-OrderedExecutor-1-0] WARN org.apache.pulsar.broker.service.BrokerService - Got exception when reading persistence policy for persistent://my_tenant/my_ns/data_down-5034936e-c9bb-4720-874d-e7e6e5e6d897: null java.util.concurrent.TimeoutException: null at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252] at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.service.BrokerService.lambda$getManagedLedgerConfig$34(BrokerService.java:1074) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.bookkeeper.mledger.util.SafeRun$2.safeRun(SafeRun.java:49) [org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] 09:50:14.820 [prometheus-stats-43-1] ERROR org.apache.pulsar.broker.service.BacklogQuotaManager - Failed to read policies data, will apply the default backlog quota: namespace=my_tenant/my_ns java.util.concurrent.TimeoutException: null at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252] at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:64) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.service.persistent.PersistentTopic.getBacklogQuota(PersistentTopic.java:1859) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.getTopicStats(NamespaceStatsAggregator.java:97) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$null$0(NamespaceStatsAggregator.java:65) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$null$1(NamespaceStatsAggregator.java:64) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$generate$2(NamespaceStatsAggregator.java:63) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.generate(NamespaceStatsAggregator.java:60) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsGenerator.generate(PrometheusMetricsGenerator.java:85) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:70) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.generate(NamespaceStatsAggregator.java:60) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$generate$2(NamespaceStatsAggregator.java:63) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] java.util.concurrent.TimeoutException: null at org.apache.pulsar.broker.service.BrokerService.lambda$getManagedLedgerConfig$34(BrokerService.java:1074) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] java.util.concurrent.TimeoutException: null at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:70) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] This previous exception was throws only in my broker-0. (we have 4 running broker and 4 zookeeper). Then, as my broker-0 was down, it seems load balancing not correctly worked and not dispatched to each others brokers. It seems we need to restart all brokers to have a working load balancing GitHub link: https://github.com/apache/pulsar/discussions/20251#discussioncomment-5835089 ---- This is an automatically sent email for commits@pulsar.apache.org. To unsubscribe, please send an email to: commits-unsubscr...@pulsar.apache.org