lhotari commented on issue #23845:
URL: https://github.com/apache/pulsar/issues/23845#issuecomment-2599470326

   I think I finally found a potential race condition by analysing the code. 
   It should be possible to first modify the code by introducing delays to 
verify that the race condition is possible.
   
   When a consumer is removed, the pending messages get added to the replay 
queue a new read gets triggered:
   
https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L243-L258
   
   This calls `readMoreEntriesAsync`:
   
https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L1432-L1437
   
   In the "classic" implementation, there's a direct `readMoreEntries` call 
here:
   
https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumersClassic.java#L231-L240
   
   This also supports this theory, that this problem appears in 4.0, but not 
with 3.x Key_Shared implementation.
   
   The reason why the `readMoreEntriesAsync` is a problem could be explained 
this way.
   When `readMoreEntries` gets called, it could drop out of the method here:
   
https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L338-L346
   
   This is the code for `sendInProgress` related ones:
   
https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L792-L814
   
   By default, the `sendMessagesToConsumers` method gets called asynchronously:
   
https://github.com/apache/pulsar/blob/ea56ada4f3985c93b93c64d1361b3111cd98a37f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L752-L761
   
   It will first set the `sendInProgress` flag and then schedule the call. Any 
`readMoreEntries` calls happening before `handleSendingMessagesAndReadingMore` 
is called, will be dropped. If the `handleSendingMessagesAndReadingMore` 
doesn't trigger a new call to `readMoreEntries` (like it should and can validly 
do), the problem described in the issue can occur.
   
   A similar race condition problem could also happen with the Shared 
subscription type, this is not specific to Key_Shared.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to