lhotari commented on issue #23845: URL: https://github.com/apache/pulsar/issues/23845#issuecomment-2599470326
I think I finally found a potential race condition by analysing the code. It should be possible to first modify the code by introducing delays to verify that the race condition is possible. When a consumer is removed, the pending messages get added to the replay queue a new read gets triggered: https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L243-L258 This calls `readMoreEntriesAsync`: https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L1432-L1437 In the "classic" implementation, there's a direct `readMoreEntries` call here: https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumersClassic.java#L231-L240 This also supports this theory, that this problem appears in 4.0, but not with 3.x Key_Shared implementation. The reason why the `readMoreEntriesAsync` is a problem could be explained this way. When `readMoreEntries` gets called, it could drop out of the method here: https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L338-L346 This is the code for `sendInProgress` related ones: https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L792-L814 By default, the `sendMessagesToConsumers` method gets called asynchronously: https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L752-L761 It will first set the `sendInProgress` flag and then schedule the call. Any `readMoreEntries` calls happening before `handleSendingMessagesAndReadingMore` is called, will be dropped. If the `handleSendingMessagesAndReadingMore` doesn't trigger a new call to `readMoreEntries` (like it should and can validly do), the problem described in the issue can occur. A similar race condition problem could also happen with the Shared subscription type, this is not specific to Key_Shared. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
