otmanel31 commented on issue #12160: URL: https://github.com/apache/pulsar/issues/12160#issuecomment-933259609
1) Also, two days ago, we also face an other issue (that trigger a shutdown of our deployement), where brokers timeout on zookeeper call. Before first exception, i catched a lot of logs (Info) in few seconds like below for each topics in all brokers: - 09:49:31.186 [ForkJoinPool.commonPool-worker-1] INFO org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling for persistent://my_tenant/my_ns/action_down-f640fa62-3245-41af-81f5-edf868118a9a 09:49:31.186 [ForkJoinPool.commonPool-worker-1] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://my_tenant/my_ns/action_down-f640fa62-3245-41af-81f5-edf868118a9a] Policies updated successfully For your information, we manage more than 25000 topics, so we have this 2 lines for each active topic. Is there any request to zookeeper when this 2 previous log lines appear ? 2) Then first exception thrown is: 09:50:06.167 [pulsar-ordered-OrderedExecutor-1-0] WARN org.apache.pulsar.broker.service.BrokerService - Got exception when reading persistence policy for persistent://my_tenant/my_ns/data_down-5034936e-c9bb-4720-874d-e7e6e5e6d897: null java.util.concurrent.TimeoutException: null at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252] at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.service.BrokerService.lambda$getManagedLedgerConfig$34(BrokerService.java:1074) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.bookkeeper.mledger.util.SafeRun$2.safeRun(SafeRun.java:49) [org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] 09:50:14.820 [prometheus-stats-43-1] ERROR org.apache.pulsar.broker.service.BacklogQuotaManager - Failed to read policies data, will apply the default backlog quota: namespace=my_tenant/my_ns java.util.concurrent.TimeoutException: null at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252] at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:64) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.service.persistent.PersistentTopic.getBacklogQuota(PersistentTopic.java:1859) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.getTopicStats(NamespaceStatsAggregator.java:97) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$null$0(NamespaceStatsAggregator.java:65) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$null$1(NamespaceStatsAggregator.java:64) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$generate$2(NamespaceStatsAggregator.java:63) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.generate(NamespaceStatsAggregator.java:60) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsGenerator.generate(PrometheusMetricsGenerator.java:85) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:70) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.generate(NamespaceStatsAggregator.java:60) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.NamespaceStatsAggregator.lambda$generate$2(NamespaceStatsAggregator.java:63) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] java.util.concurrent.TimeoutException: null at org.apache.pulsar.broker.service.BrokerService.lambda$getManagedLedgerConfig$34(BrokerService.java:1074) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] java.util.concurrent.TimeoutException: null at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.6.1.jar:2.6.1] at org.apache.pulsar.broker.stats.prometheus.PrometheusMetricsServlet.lambda$doGet$0(PrometheusMetricsServlet.java:70) ~[org.apache.pulsar-pulsar-broker-2.6.1.jar:2.6.1] This previous exception was throws only in my broker-0. (we have 4 running broker and 4 zookeeper). Then, as my broker-0 was down, it seems load balancing not correctly worked and not dispatched to each others brokers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
