lhotari commented on PR #23901: URL: https://github.com/apache/pulsar/pull/23901#issuecomment-2619167696
> Could we confirm if we are not over-engineering this rate limiter? Good questions, @heesung-sn. > * can't we reuse any exisiting queue? why do we need a separate queue here? That's a valid point. In the current solution, there's also a queue when `managedLedgerReadEntryTimeoutSeconds` is set. The executor's queue is used as a queue. That's a bad solution for the queue since there will be a tight loop to wait for more permits to be available. That consumes unnecessary CPU and, in addition, there's a problem with "fairness". The requests acquiring permits don't get handled in a consistent order. The separate queue in InflightsReadsLimiter addresses both of these issues. > why do we need a separate queue here? There will always be a need for a queue unless excessive traffic is shed by rejecting requests and making them fail. It's a valid question whether the implementation could queue up requests to another queue completely or fail requests. The current solution already rejects requests when the limit is reached. That's a problem, especially for enabling the `managedLedgerMaxReadsInFlightSizeInMB` limit for replay queue reads. The scope of this PR is to apply the `managedLedgerMaxReadsInFlightSizeInMB` limit for replay queue reads. If individual reads fail, the complete result will be discarded due to this logic: https://github.com/apache/pulsar/blob/331a997b76b83b3eca777c4559eb60b940d30c27/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1628-L1658 (I have other plans to improve this logic, but that's not the scope of this PR.) That's why failing individual read operations isn't a proper way to trigger backpressure in Pulsar. It will add even more load to the system when retries happen. A better approach is to slow down processing (queue) and let all other backpressure solutions propagate and eventually slow down incoming work into the system. That's how backpressure should work, but there are currently gaps in Pulsar. There's a broader design problem that backpressure isn't designed end-to-end so that this would actually happen and would be validated with analysis andtests. There was [a discussion about backpressure on the dev mailing list in 2022](https://lists.apache.org/thread/v7xy57qfzbhopoqbm75s6ng8xlhbr2q6) without much participation from the community ([backpressure was also covered slightly in rate limiting discussions](https://lists.apache.org/thread/rs95ww4mfx1x08nk2zsr5d31p4wkpy81)). In general, there are solutions for minimal queues in backpressure solutions. When all operations are blocking operations, it's a simple problem to solve. With asynchronous operations, it becomes a hard problem to solve. In the JVM space, the Reactive Streams specification and libraries are explicitly focused on solving asynchronous non-blocking backpressure with minimal queues (buffers) in processing pipelines. In the case of the `managedLedgerMaxReadsInFlightSizeInMB` limit, I believe that queuing in the way that is handled in this PR is the correct solution for handling backpressure. The reason for this is that the majority of read operations are created by dispatchers in Pulsar. There's a finite number of dispatchers at all times and each dispatcher will have at most 2 read operations (1 pending read and 1 pending replay queue read) in flight at a time. This makes it a "closed system". When the limit is reached, the dispatchers won't be adding more work into the system and will wait for more permits to be available. > * why can't we use simple Guava.RateLimiter or other simpler method? Essentially, InflightsReadsLimiter is like Java's java.util.concurrent.Semaphore. It's not a rate limiter where the rate is defined in terms of requests per second. The InflightsReadLimiter class is already very simple: * The complete class file, including all comment headers, comments and empty lines, is 285 lines. * A lot of the code lines are related to metrics. While there might be room for further simplification in the InflightReadLimiter, I believe the current implementation strikes a good balance between functionality and simplicity. The core logic is quite straightforward and the additional code is mostly for metrics and monitoring, which are important for production systems. How would you suggest simplifying the InflightReadLimiter? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
