TakaHiR07 opened a new issue, #23101:
URL: https://github.com/apache/pulsar/issues/23101

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [X] I understand that unsupported versions don't get bug fixes. I will 
attempt to reproduce the issue on a supported version of Pulsar client and 
Pulsar broker.
   
   
   ### Version
   
   branch-3.0.5
   
   ### Minimal reproduce step
   
   perf produce and continue to restart broker 
   
   ### What did you expect to see?
   
   ...
   
   ### What did you see instead?
   
   Topic load stuck and become unavailable. We add extra log and find out the 
problem.
   
   ```
        2024-07-30T12:36:39,377+0800 [main] INFO  
org.apache.pulsar.broker.PulsarService - Git Branch ${git.branch}
        2024-07-30T12:36:46,072+0800 [main] INFO  
org.apache.pulsar.broker.PulsarService - Starting load management service ...
        2024-07-30T12:36:47,632+0800 [pulsar-io-23-29] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/xxx:xxx] connected with 
role=admin using authMethod=token, clientVersion=2.9.5, 
clientProtocolVersion=19, proxyVersion=null
        2024-07-30T12:36:47,347+0800 
[BookKeeperClientWorker-OrderedExecutor-11-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - 
[test/test/persistent/test-partition-48] Successfully initialize managed ledger
        2024-07-30T12:36:47,790+0800 
[BookKeeperClientWorker-OrderedExecutor-11-0] ERROR 
org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - 
[test/test/persistent/test-partition-48] open ledger failed. 
exception:java.util.concurrent.CompletionException: 
java.lang.NullPointerException: Cannot invoke 
"org.apache.pulsar.broker.transaction.pendingack.TransactionPendingAckStoreProvider.checkInitializedBefore(org.apache.pulsar.broker.service.persistent.PersistentSubscription)"
 because "this.pendingAckStoreProvider" is null
        2024-07-30T12:36:48,449+0800 [main] INFO  
org.apache.pulsar.broker.namespace.NamespaceService - added heartbeat namespace 
name in local cache: ns=pulsar/xxx:xxx
   
        2024-07-30T12:37:46,171+0800 [pulsar-io-23-8] ERROR 
org.apache.pulsar.broker.service.ServerCnx - [/xxx:xxx] Failed to create topic 
persistent://test/test/test-partition-48, producerId=828
        java.util.concurrent.CompletionException: 
org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to 
load topic within timeout
   ```
   
   ### Anything else?
   
   From the above log, we can see that the reason is because broker can serve 
request when it is not fully initialize. Actually, I guess this issue is fix in 
https://github.com/apache/pulsar/pull/22977
   
   But there is another issue. This npe problem is hidden until we add extra 
log. The reason is createTopic only catch PulsarServerException here. So the 
other exception would be throw to ManagedLedgerFactoryImpl#exceptionally, but 
it would not trigger callback.openLedgerFailed because callback already trigger 
openLedgerComplete. Therefore, the exception is hidden and topic can not be 
load again.
   
   
https://github.com/apache/pulsar/blob/6bbaec1f6b1cc09de42f14dccca1afd932c547d5/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1802-L1809
   
   
https://github.com/apache/pulsar/blob/6bbaec1f6b1cc09de42f14dccca1afd932c547d5/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java#L450-L456
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to