zymap opened a new issue, #3231: URL: https://github.com/apache/bookkeeper/issues/3231
# BookKeeper client memory limits # Motivation If one bookie is slow (not down, just slow), the BK client will the acks to the user that the entries are written after the first 2 acks. In the meantime, it will keep waiting for the 3rd bookie to respond. If the bookie responds within the timeout, the entries can now be dropped from memory, otherwise the write will timeout internally and it will get replayed to a new bookie. In both cases, the amount of memory used in the client will max at "throughput" * "timeout". This can be a large amount of memory and easily cause OOM errors. Part of the problem is that it cannot be solved from outside the BK client, since there's no visibility on what entries have 2 or 3 acks and therefore it's not possible to apply backpressure. Instead, there should be a backpressure mechanism in the BK client itself to prevent this kind of issue. # Proposed Change We have introduced a memory limit controller in the PR [https://github.com/apache/bookkeeper/pull/2710](https://github.com/apache/bookkeeper/pull/2710), so we can easily apply the memory limit controller in the bookie client to control the client memory usage. We require the memory when there has an add entry request, and release the memory when the request sends successfully from the client. Add configuration `clientMemoryLimitEnabled` and `clientMemoryLimitByBytes` configurations in the client configuration to control the memory limit. For more detailed information, here is the proposed PR: https://github.com/apache/bookkeeper/pull/3139. # *Migration Plan and Compatibility* We disable this feature by default so there is no compatibility issue with this feature. # *Rejected Alternatives* Using server-side backpressure to control. Configuring the backpressure won't’ resolve the client side OOM issue. When we configure WQ > AQ, the slowest bookie won’t impact the add entry request, the client won’t stop adding entry because it can receive 2 successful responses from the servers. And then the client still has entries waiting for the response from the slowest bookie, then the client's memory will be increased quickly and OOM finally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
