massakam opened a new pull request #8406:
URL: https://github.com/apache/pulsar/pull/8406
### Motivation
The other day, some of our broker servers had deadlocks while splitting
namespace bundles. As a result of checking the thread dump of the broker, some
threads were blocked in `NamespaceService#getBundle()`.
```
"ForkJoinPool.commonPool-worker-120" #547 daemon prio=5 os_prio=0
tid=0x00007efab4020800 nid=0x1318b waiting on condition [0x00007efa229e7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f385c0dc720> (a
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
at
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at
com.github.benmanes.caffeine.cache.LocalAsyncLoadingCache$LoadingCacheView.get(LocalAsyncLoadingCache.java:400)
at
org.apache.pulsar.common.naming.NamespaceBundleFactory.getBundles(NamespaceBundleFactory.java:155)
at
org.apache.pulsar.broker.namespace.NamespaceService.getBundle(NamespaceService.java:177)
at
org.apache.pulsar.broker.namespace.NamespaceService.isTopicOwned(NamespaceService.java:849)
at
org.apache.pulsar.broker.namespace.NamespaceService.isServiceUnitOwned(NamespaceService.java:813)
at
org.apache.pulsar.broker.service.BrokerService.checkTopicNsOwnership(BrokerService.java:1013)
at
org.apache.pulsar.broker.service.BrokerService.loadOrCreatePersistentTopic(BrokerService.java:625)
at
org.apache.pulsar.broker.service.BrokerService.lambda$getTopic$6(BrokerService.java:500)
at
org.apache.pulsar.broker.service.BrokerService$$Lambda$476/389775283.apply(Unknown
Source)
at
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.put(ConcurrentOpenHashMap.java:274)
at
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.computeIfAbsent(ConcurrentOpenHashMap.java:129)
at
org.apache.pulsar.broker.service.BrokerService.getTopic(BrokerService.java:499)
at
org.apache.pulsar.broker.service.BrokerService.getOrCreateTopic(BrokerService.java:483)
at
org.apache.pulsar.broker.service.ServerCnx.lambda$null$13(ServerCnx.java:681)
at
org.apache.pulsar.broker.service.ServerCnx$$Lambda$835/1815803313.apply(Unknown
Source)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:943)
at
java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:163)
```
I think this is the deadlock that should have been fixed in
https://github.com/apache/pulsar/pull/4190. It seems that
https://github.com/apache/pulsar/pull/4190 has been reverted by
https://github.com/apache/pulsar/pull/5919.
### Modifications
The blocking method `getBundle()` should not be used in
`NamespaceService#isTopicOwned()`. However, reverting
https://github.com/apache/pulsar/pull/5919 reoccurs the issue that the clients
cannot reconnect to the topic of the splited bundle.
So, ʻisTopicOwned()` returns false once, but gets the bundle metadata
asynchronously so that the metadata is cached. The next time the client
reconnects, the bundle metadata has been cached so it can return the correct
result.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]