sijie commented on issue #1088: ISSUE #1086 (@bug W-4146427@) Client-side backpressure in netty (Fixes: io.netty.util.internal.OutOfDirectMemoryError under continuous heavy load) URL: https://github.com/apache/bookkeeper/pull/1088#issuecomment-361807580 @dlg99 I see the metrics do show great results. However I have one concern about this change, it is basically blocking the scheduler threads on waiting a slow netty channel, this can result in bad side-effects - e.g responses might not be able to be processed, other requests will also potentially be blocked as well. so I guess the benchmark itself is only testing writes and most likely one one or a few ledgers. In this case, the benchmark can easily show improvements, because there is effectively only one client/ledger handle is writing. However a real-world use case can be different, this change here will cause a situation - a slow bookie can potentially impact the whole application that use the bookkeeper client. Ideally I think the throttling/back-pressure should be pop back to the ledger handles who issue the write request, rather than asking PerChannelBookieClient to do the blocking-wait work. I am not sure what would be the best approach here. One thought I have here: - when a ledger handle (effectively PendingAddOp) attempts to sendWriteRequest, PCBC checks if the channel is writable, if it is not writable, fail the write request with a special exception. PCBC itself doesn't block on waiting channel to become writable. - when PendingAddOp receives this special exception, it knows the channel is not available for write. then you can apply the throttling logic here: 1) you can notify the ledger handle that a channel is not available, asking the ledger handle to slow down applications or block the application 2) at the same time, you can listen on the channel state change. once the channel is writable, issue the write to the channel, and unblock the application for submitting add entries. In this way, you can block the application thread rather than the internal io scheduler which can be shared across multiple ledger handles.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services