lhotari commented on code in PR #23226:
URL: https://github.com/apache/pulsar/pull/23226#discussion_r1734462153
##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java:
##########
@@ -671,19 +682,28 @@ public final synchronized void
readEntriesComplete(List<Entry> entries, Object c
// in a separate thread, and we want to prevent more reads
acquireSendInProgress();
dispatchMessagesThread.execute(() -> {
- if (sendMessagesToConsumers(readType, entries, false)) {
- updatePendingBytesToDispatch(-size);
- readMoreEntries();
- } else {
- updatePendingBytesToDispatch(-size);
- }
+ handleSendingMessagesAndReadingMore(readType, entries, false,
totalBytesSize);
});
} else {
- if (sendMessagesToConsumers(readType, entries, true)) {
- updatePendingBytesToDispatch(-size);
- readMoreEntriesAsync();
- } else {
- updatePendingBytesToDispatch(-size);
+ handleSendingMessagesAndReadingMore(readType, entries, true,
totalBytesSize);
+ }
+ }
+
+ private synchronized void handleSendingMessagesAndReadingMore(ReadType
readType, List<Entry> entries,
+ boolean
needAcquireSendInProgress,
+ long
totalBytesSize) {
+ boolean triggerReadingMore = sendMessagesToConsumers(readType,
entries, needAcquireSendInProgress);
+ int entriesDispatched = lastNumberOfEntriesDispatched;
+ updatePendingBytesToDispatch(-totalBytesSize);
+ if (triggerReadingMore) {
+ if (entriesDispatched > 0) {
+ // Reset the backoff when we successfully dispatched messages
+ rescheduleReadBackoff.reset();
+ // Call readMoreEntries in the same thread to trigger the next
read
+ readMoreEntries();
+ } else if (entriesDispatched == 0) {
+ // If no messages were dispatched, we need to reschedule a new
read with an increasing backoff delay
+ reScheduleReadInMs(rescheduleReadBackoff.next());
Review Comment:
It's possible to reproduce the issue with the instructions in #23200. I also
assume that any test application where there's a producer and multiple
consumers on a key_shared subscription with a random delay such as this in the
processing will reproduce the problem.
```java
// sleep for a random time with 3% probability
if (random.nextInt(100) < 3) {
try {
Thread.sleep(random.nextInt(100) + 1);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
```
@equanz Without the backoff, the readMoreEntries method call gets called
immediately in the code that you pointed out. When the consumers are without
permits, this will happen in a tight loop multiple times and in many cases all
messages in the backlog get moved to the redelivery controller to replay.
That's the problem that this PR and the separate limit PR #23231 address.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]