TakaHiR07 opened a new issue, #23101: URL: https://github.com/apache/pulsar/issues/23101
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Read release policy - [X] I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker. ### Version branch-3.0.5 ### Minimal reproduce step perf produce and continue to restart broker ### What did you expect to see? ... ### What did you see instead? Topic load stuck and become unavailable. We add extra log and find out the problem. ``` 2024-07-30T12:36:39,377+0800 [main] INFO org.apache.pulsar.broker.PulsarService - Git Branch ${git.branch} 2024-07-30T12:36:46,072+0800 [main] INFO org.apache.pulsar.broker.PulsarService - Starting load management service ... 2024-07-30T12:36:47,632+0800 [pulsar-io-23-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/xxx:xxx] connected with role=admin using authMethod=token, clientVersion=2.9.5, clientProtocolVersion=19, proxyVersion=null 2024-07-30T12:36:47,347+0800 [BookKeeperClientWorker-OrderedExecutor-11-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [test/test/persistent/test-partition-48] Successfully initialize managed ledger 2024-07-30T12:36:47,790+0800 [BookKeeperClientWorker-OrderedExecutor-11-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [test/test/persistent/test-partition-48] open ledger failed. exception:java.util.concurrent.CompletionException: java.lang.NullPointerException: Cannot invoke "org.apache.pulsar.broker.transaction.pendingack.TransactionPendingAckStoreProvider.checkInitializedBefore(org.apache.pulsar.broker.service.persistent.PersistentSubscription)" because "this.pendingAckStoreProvider" is null 2024-07-30T12:36:48,449+0800 [main] INFO org.apache.pulsar.broker.namespace.NamespaceService - added heartbeat namespace name in local cache: ns=pulsar/xxx:xxx 2024-07-30T12:37:46,171+0800 [pulsar-io-23-8] ERROR org.apache.pulsar.broker.service.ServerCnx - [/xxx:xxx] Failed to create topic persistent://test/test/test-partition-48, producerId=828 java.util.concurrent.CompletionException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout ``` ### Anything else? From the above log, we can see that the reason is because broker can serve request when it is not fully initialize. Actually, I guess this issue is fix in https://github.com/apache/pulsar/pull/22977 But there is another issue. This npe problem is hidden until we add extra log. The reason is createTopic only catch PulsarServerException here. So the other exception would be throw to ManagedLedgerFactoryImpl#exceptionally, but it would not trigger callback.openLedgerFailed because callback already trigger openLedgerComplete. Therefore, the exception is hidden and topic can not be load again. https://github.com/apache/pulsar/blob/6bbaec1f6b1cc09de42f14dccca1afd932c547d5/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1802-L1809 https://github.com/apache/pulsar/blob/6bbaec1f6b1cc09de42f14dccca1afd932c547d5/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java#L450-L456 ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
