GitHub user otmanel31 added a comment to the discussion: [question] pulsar 
broker reboot

1) Also, two days ago, we also face an other issue (that trigger a shutdown of 
our deployement), where brokers timeout on zookeeper call. Before first 
exception, i catched a lot of logs (Info) in few seconds like below for each 
topics in all brokers:
- 09:49:31.186 [ForkJoinPool.commonPool-worker-1] INFO  
org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling 
for 
persistent://my_tenant/my_ns/action_down-f640fa62-3245-41af-81f5-edf868118a9a
09:49:31.186 [ForkJoinPool.commonPool-worker-1] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://my_tenant/my_ns/action_down-f640fa62-3245-41af-81f5-edf868118a9a] 
Policies updated successfully
For your information, we manage more than 25000 topics, so we have this 2 lines 
for each active topic.
Is there any request to zookeeper when this 2 previous log  lines appear ? 

2) Then first exception thrown is: 

09:50:06.167 [pulsar-ordered-OrderedExecutor-1-0] WARN  
org.apache.pulsar.broker.service.BrokerService - Got exception when reading 
persistence policy for 
persistent://my_tenant/my_ns/data_down-5034936e-c9bb-4720-874d-e7e6e5e6d897: 
null
java.util.concurrent.TimeoutException: null
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) 
~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) 
~[?:1.8.0_252]
at 
org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) 
~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.service.BrokerService.lambda$getManagedLedgerConfig$34(BrokerService.java:1074)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at org.apache.bookkeeper.mledger.util.SafeRun$2.safeRun(SafeRun.java:49) 
[org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) 
[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_252]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_252]
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
09:50:14.820 [prometheus-stats-43-1] ERROR 
org.apache.pulsar.broker.service.BacklogQuotaManager - Failed to read policies 
data, will apply the default backlog quota: namespace=my_tenant/my_ns
java.util.concurrent.TimeoutException: null
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) 
~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) 
~[?:1.8.0_252]
at 
org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) 
~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:64)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.service.persistent.PersistentTopic.getBacklogQuota(PersistentTopic.java:1859)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.getTopicStats(NamespaceStatsAggregator.java:97)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$null$0(NamespaceStatsAggregator.java:65)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388)
 ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160)
 ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$null$1(NamespaceStatsAggregator.java:64)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388)
 ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160)
 ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$generate$2(NamespaceStatsAggregator.java:63)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388)
 ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160)
 ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.generate(NamespaceStatsAggregator.java:60)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsGenerator.generate(PrometheusMetricsGenerator.java:85)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:70)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) 
[org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) 
[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_252]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_252]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_252]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_252]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_252]
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]
at 
org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.generate(NamespaceStatsAggregator.java:60)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$generate$2(NamespaceStatsAggregator.java:63)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
java.util.concurrent.TimeoutException: null
at 
org.apache.pulsar.broker.service.BrokerService.lambda$getManagedLedgerConfig$34(BrokerService.java:1074)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]
java.util.concurrent.TimeoutException: null
at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) 
[org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1]
at 
org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:70)
 ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1]

This previous exception was throws only in my broker-0. (we have 4 running 
broker and 4 zookeeper).
Then, as my broker-0 was down, it seems load balancing not correctly worked and 
not dispatched to each others brokers. It seems we need to restart all brokers 
to have a working load balancing



GitHub link: 
https://github.com/apache/pulsar/discussions/20251#discussioncomment-5835089

----
This is an automatically sent email for commits@pulsar.apache.org.
To unsubscribe, please send an email to: commits-unsubscr...@pulsar.apache.org

Reply via email to