bharanic-dev opened a new pull request #14634: URL: https://github.com/apache/pulsar/pull/14634
Fixes #14633 *(or if this PR is one task of a github issue, please add `Master Issue: #<xyz>` to link to the master issue.)* ### Motivation https://github.com/apache/pulsar/blob/2b3e8aeb5a1c259e0325e5a91dc5d7e20c6ee569/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1757 makes a blocking call to metadatastore operation (stack trace below) while holding a lock (due to the synchronized keyword). But the callback that completes the future can't execute (metadata-store executor is a single threaded executor) because the callback is blocked waiting for the lock held by this thread. "pulsar-backlog-quota-checker-30-1" #88 prio=5 os_prio=0 cpu=662.90ms elapsed=81026.81s tid=0x00007f17031ce000 nid=0xa4 waiting on condition [0x00007f15a631e000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x00000007c1f55408> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234) at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1798) at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128) at java.util.concurrent.CompletableFuture.timedGet([email protected]/CompletableFuture.java:1868) at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:2021) at org.apache.pulsar.broker.resources.BaseResources.get(BaseResources.java:86) at org.apache.pulsar.broker.resources.NamespaceResources.getPolicies(NamespaceResources.java:105) at org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:70) at org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:82) at org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuotaLimitInSize(BacklogQuotaManager.java:101) at org.apache.pulsar.broker.service.persistent.PersistentTopic.isSizeBacklogExceeded(PersistentTopic.java:2502) at org.apache.pulsar.broker.service.BrokerService.lambda$monitorBacklogQuota$69(BrokerService.java:1611) at org.apache.pulsar.broker.service.BrokerService$$Lambda$713/0x00000008406a9840.accept(Unknown Source) at org.apache.pulsar.broker.service.BrokerService$$Lambda$709/0x00000008406a8840.accept(Unknown Source) at java.util.Optional.ifPresent([email protected]/Optional.java:183) at org.apache.pulsar.broker.service.BrokerService.lambda$forEachTopic$68(BrokerService.java:1599) at org.apache.pulsar.broker.service.BrokerService$$Lambda$708/0x00000008406a8440.accept(Unknown Source) at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) at org.apache.pulsar.broker.service.BrokerService.forEachTopic(BrokerService.java:1597) at org.apache.pulsar.broker.service.BrokerService.monitorBacklogQuota(BrokerService.java:1608) - locked <0x00000003018b1c30> (a org.apache.pulsar.broker.service.BrokerService) at org.apache.pulsar.broker.service.BrokerService$$Lambda$320/0x00000008403f4840.run(Unknown Source) at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) at java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:515) at java.util.concurrent.FutureTask.runAndReset([email protected]/FutureTask.java:305) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:305) at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run([email protected]/Thread.java:829) https://github.com/apache/pulsar/blob/0a91196dcc4d31ae647867ed319b8c1af0cb93c6/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L78 ### Modifications The synchronized keyword was added as part of https://github.com/apache/pulsar/pull/14367. This causes the deadlock. The synchronized is not really required as the topic datastructure is a concurrentHashMap. ### Verifying this change - [x] Make sure that the change passes the CI checks. This change is a trivial rework / code cleanup without any test coverage. The fix was also verified in production. After deploying the broker deadlocks and restarts went away. ### Documentation - [x] `no-need-doc` Internal fix. Not user visible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
