Technoboy- opened a new pull request, #17689: URL: https://github.com/apache/pulsar/pull/17689
Cherry-pick #15755 Master issue #15643, #15753 ### Motivation Blocked at BrokerService#unloadNamespaceBundlesGracefully: ``` 2022-05-20T03:37:05.4960249Z "main" #1 prio=5 os_prio=0 cpu=32274.29ms elapsed=2566.54s tid=0x00007fd108024380 nid=0x1af8f waiting on condition [0x00007fd10fcd0000] 2022-05-20T03:37:05.4960659Z java.lang.Thread.State: WAITING (parking) 2022-05-20T03:37:05.4961114Z at jdk.internal.misc.Unsafe.park([email protected]/Native Method) 2022-05-20T03:37:05.4961875Z - parking to wait for <0x00000000cdf00010> (a java.util.concurrent.CompletableFuture$Signaller) 2022-05-20T03:37:05.4962343Z at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211) 2022-05-20T03:37:05.4963171Z at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1864) 2022-05-20T03:37:05.4963683Z at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3463) 2022-05-20T03:37:05.4964169Z at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3434) 2022-05-20T03:37:05.4964660Z at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1898) 2022-05-20T03:37:05.4965158Z at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:2072) 2022-05-20T03:37:05.4965715Z at org.apache.pulsar.broker.service.BrokerService.lambda$unloadNamespaceBundlesGracefully$21(BrokerService.java:919) 2022-05-20T03:37:05.4966467Z at org.apache.pulsar.broker.service.BrokerService$$Lambda$1164/0x0000000801527c70.accept(Unknown Source) 2022-05-20T03:37:05.4966882Z at java.lang.Iterable.forEach([email protected]/Iterable.java:75) 2022-05-20T03:37:05.4967408Z at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:911) 2022-05-20T03:37:05.4968078Z at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:887) 2022-05-20T03:37:05.4968664Z at org.apache.pulsar.broker.service.BrokerService.closeAsync(BrokerService.java:732) 2022-05-20T03:37:05.4969579Z at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:450) 2022-05-20T03:37:05.4970123Z at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:372) 2022-05-20T03:37:05.4970720Z at ``` Blocked at CoordinationServiceImpl#close ``` 2022-05-20T01:17:56.3359346Z "main" #1 prio=5 os_prio=0 cpu=11209.07ms elapsed=3506.06s tid=0x00007f9484024380 nid=0xaba waiting on condition [0x00007f9489edd000] 2022-05-20T01:17:56.3361587Z java.lang.Thread.State: WAITING (parking) 2022-05-20T01:17:56.3363789Z at jdk.internal.misc.Unsafe.park([email protected]/Native Method) 2022-05-20T01:17:56.3366545Z - parking to wait for <0x00000000cd180010> (a java.util.concurrent.CompletableFuture$Signaller) 2022-05-20T01:17:56.3368917Z at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211) 2022-05-20T01:17:56.3371298Z at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1864) 2022-05-20T01:17:56.3373823Z at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3463) 2022-05-20T01:17:56.3376212Z at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3434) 2022-05-20T01:17:56.3378608Z at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1898) 2022-05-20T01:17:56.3380999Z at java.util.concurrent.CompletableFuture.join([email protected]/CompletableFuture.java:2117) 2022-05-20T01:17:56.3383947Z at org.apache.pulsar.metadata.coordination.impl.CoordinationServiceImpl.close(CoordinationServiceImpl.java:72) 2022-05-20T01:17:56.3386574Z at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:526) 2022-05-20T01:17:56.3388569Z at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:372) ``` For BrokerService#unloadNamespaceBundlesGracefully, the request chain : ``` brokerService.closeAsync() -> OwnedBundle.handleUnloadRequest -> pulsar.getNamespaceService().getOwnershipCache().removeOwnership(bundle) -> OwnershipCache.removeOwnership -> ResourceLock.release ``` For CoordinationServiceImpl#close, the request chain : ``` CoordinationServiceImpl.close -> LockManager.asyncClose -> ResourceLock.release ``` We find that it's all related to ResourceLock#release. As the CI using the MockedZooKeeper, I find that if there are some RuntimeException, the response could never finish. So I add the catch block to ensure that all the requests will reply. But I'm not sure if the return code is right. https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/testmocks/src/main/java/org/apache/zookeeper/MockZooKeeper.java#L332-L402 https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/testmocks/src/main/java/org/apache/zookeeper/MockZooKeeper.java#L916-L976 More, the current close process has some order issues. LoadManager is closed before BrokerService, but BrokerService closes need to invoke LoadManager, even though the LoadManager is stateless, but is a little confused here. https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/pulsar-broker/src/main/java/org/apache/pulsar/broker/PulsarService.java#L443-L452 https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L891-L902 ### Documentation - [x] `no-need-doc` (Please explain why) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
