lhotari opened a new pull request, #23226:
URL: https://github.com/apache/pulsar/pull/23226

   Main Issue: #23200 
   
   ### Motivation
   
   There's currently a clear problem with Key_Shared that in normal operations, 
it causes a lot of "ack holes" which result in several problems. One of the 
problems is the latency issues that are explained in #23200. Another problem is 
that the large number of "ack holes" exceed 
managedLedgerMaxUnackedRangesToPersist (10000) in usual cases such as in the 
demonstration in #23200. 
   
   There are multiple other issues where there has been a large number of "ack 
holes" when Pulsar users have experienced problems. One of the previous 
mitigations is [PIP-299: Stop dispatch messages if the individual acks will be 
lost in the persistent 
storage.](https://github.com/apache/pulsar/blob/master/pip/pip-299.md). The 
need for PIP-299 proves that the large number of "ack holes" is a fairly common 
problem.
   
   ### Modifications
   
   While experimenting on #23200, it was determined that #7105 changes were 
related to the cause of the issue.
   I also noticed that #18315 contained some impactful changes 
(https://github.com/apache/pulsar/pull/18315/files#diff-c48d5c94842ac8c9a0c9314b207298069f38c8dcfeda4a9886fb3bb1f77843f2).
 Based on this information,
   
   I decided to implement a solution where there would be a backoff when no 
messages are dispatched.
   This PR contains a change that reschedules a call to `readMoreEntries` where 
the delay is exponentially increasing as long as no entries are dispatched. The 
backoff delay starts at 100ms and is limited to 5000ms. These values are 
currently static but they could be made configurable. 
   
   ### Additional context
   
   While testing this change, I happened to notice that this change mitigates 
the problem in the reproducer of of #23200.
   
   With the changes of this PR, these are the results:
   ```
   2024-08-26T16:09:42,328+0300 [main] INFO  
playground.TestScenarioIssueKeyShared - Done receiving. Remaining: 0 
duplicates: 0 unique: 1000000
   max latency difference of subsequent messages: 974 ms
   max ack holes: 668
   2024-08-26T16:09:42,329+0300 [main] INFO  
playground.TestScenarioIssueKeyShared - Consumer consumer1 received 259642 
unique messages 0 duplicates in 456 s, max latency difference of subsequent 
messages 763 ms
   2024-08-26T16:09:42,329+0300 [main] INFO  
playground.TestScenarioIssueKeyShared - Consumer consumer2 received 233963 
unique messages 0 duplicates in 456 s, max latency difference of subsequent 
messages 974 ms
   2024-08-26T16:09:42,329+0300 [main] INFO  
playground.TestScenarioIssueKeyShared - Consumer consumer3 received 244279 
unique messages 0 duplicates in 457 s, max latency difference of subsequent 
messages 898 ms
   2024-08-26T16:09:42,329+0300 [main] INFO  
playground.TestScenarioIssueKeyShared - Consumer consumer4 received 262116 
unique messages 0 duplicates in 456 s, max latency difference of subsequent 
messages 657 ms
   ```
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to