dao-jun opened a new pull request, #22792: URL: https://github.com/apache/pulsar/pull/22792
Fixes https://github.com/apache/pulsar/issues/22129 ### Motivation Pulsar uses binary search to find the message by timestamp, it will reduce the number of iterations to find the message, and make it more efficient and faster. Even though the current implementation is correct, and using binary search to speed-up, but it's still not efficient *enough*. The current implementation is to scan all the ledgers to find the message by timestamp. This is a performance bottleneck, especially for large topics with many messages. Say, if there is a topic which has 1m entries, through the binary search, it will take 20 iterations to find the message. In some extreme cases, it may lead to a timeout, and the client will not be able to seeking by timestamp. The motivation of this PR is to optimize the finding message by timestamp, to make it more efficient and faster. ### Modifications Before search entires, calculate the `start`, `end` position by `LedgerInfo#timestamp` and *only* search entries in the range to avoid search the entire ledgers. ### Verifying this change - [ ] Make sure that the change passes the CI checks. *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (10MB)* - *Extended integration test for recovery after broker failure* ### Does this pull request potentially affect one of the following parts: <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] The metrics - [ ] Anything that affects deployment ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [x] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
