rdhabalia opened a new issue, #20393: URL: https://github.com/apache/pulsar/issues/20393
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Version > pulsar-2.10, 2.11 ### Minimal reproduce step 1. Sometime Broker tries to recover the topic cursor and couldn't create subscription successfully which causes consumer creation stuck at broker and client application times out and fails to subscribe on topic. 2. Broker log ``` 05:50:28.411 [pulsar-io-4-32] INFO org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38126] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub 05:50:28.411 [pulsar-io-4-32] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [pulsar/use1/myns/persistent/topic-1] Cursor my-sub recovered to position 52505149:179 05:50:28.412 [pulsar-io-4-32] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [pulsar/use1/myns/persistent/topic-1] Consumer my-sub cursor initialized with counters: consumed 0 mdPos 52505149:179 rdPos 52505149:180 05:50:28.412 [pulsar-io-4-32] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/use1/myns/persistent/topic-1] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 108, 115, 97, 114, 47, 112, 115, 116, 103, 45, 103, 113, 49, 47, 112, 115, 116, 103, 49, 45, 98, 114, 111, 107, 101, 114, 50, 46, 109, 101, 115, 115, 97, 103, 105, 110, 103, 46, 103, 113, 49, 46, 121, 97, 104, 111, 111, 46, 99, 111, 109, 58, 52, 48, 56, 48, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 115, 97, 110, 105, 116, 121, 84, 101, 115, 116, 84, 111, 112, 105, 99, 45, 49], pulsar/cursor=[115, 97, 110, 105, 116, 121, 83, 117, 98], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds 05:50:28.421 [main-EventThread] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [pulsar/use1/myns/persistent/topic-1] Created ledger 52506173 for cursor my-sub 05:50:29.860 [bookkeeper-ml-scheduler-OrderedScheduler-5-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/use1/myns/persistent/topic-1] Ledger already created when timeout task is triggered 05:51:28.412 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/use1/myns/persistent/topic-1] Ledger already created when timeout task is triggered 05:51:40.443 [pulsar-io-4-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38174] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub 05:53:47.732 [pulsar-io-4-5] INFO org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38206] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub 05:55:55.171 [pulsar-io-4-8] INFO org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38238] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub 05:58:02.655 [pulsar-io-4-11] INFO org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38316] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub 06:00:10.991 [pulsar-io-4-13] INFO org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38348] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub ``` 4. client logs while creating subscription ``` 2023-05-19 05:51:40 [pulsar-client-io-8-1] INFO o.a.pulsar.client.impl.ConsumerImpl - [persistent://pulsar/use1/myns/topic-1][my-sub] Subscribing to topic on cnx [id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651], consumerId 0 2023-05-19 05:52:10 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarDecoder - [broker2.use1.pulsar.com/1.1.1.1:6651] Received cmd PING 2023-05-19 05:52:10 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarHandler - [[id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651]] Replying back to ping mess age 2023-05-19 05:52:10 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarHandler - [[id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651]] Sending ping message 2023-05-19 05:52:10 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarDecoder - [broker2.use1.pulsar.com/1.1.1.1:6651] Received cmd PONG 2023-05-19 05:52:40 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarDecoder - [broker2.use1.pulsar.com/1.1.1.1:6651] Received cmd PING 2023-05-19 05:52:40 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarHandler - [[id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651]] Replying back to ping mess age 2023-05-19 05:52:40 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarHandler - [[id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651]] Sending ping message 2023-05-19 05:52:40 [pulsar-client-io-8-1] WARN o.a.pulsar.client.impl.ConsumerImpl - [persistent://pulsar/use1/myns/topic-1][my-sub] Failed to subscribe to topi c on broker2.use1.pulsar.com/1.1.1.1:6651 2023-05-19 05:52:40 [pulsar-client-io-8-1] WARN o.a.pulsar.client.impl.ClientCnx - [id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651] request timeout {'durationMs': '30000', 'reqId':'3312998569082859124', 'remote':'broker2.use1.pulsar.com/1.1.1.1:6651', 'local':'/1.1.1.1:38174'} ``` ### What did you expect to see? sometime broker stucks to create subscription and it can be only fixed by unloading the topic ### What did you see instead? broker should be resilient to handle failure and should make sure to recover curosr and create subscription successfully without impacting availability of the topic. ### Anything else? _No response_ ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
