dlg99 commented on issue #1088: ISSUE #1086 (@bug W-4146427@) Client-side backpressure in netty (Fixes: io.netty.util.internal.OutOfDirectMemoryError under continuous heavy load) URL: https://github.com/apache/bookkeeper/pull/1088#issuecomment-361739328 @sijie I start wit p.2: backpressure is enabled all the time. I.e. CHANNEL_WAIT_TIMEOUT_ON_WRITE does not affect read request, LAC etc. I.e. read has different ways to issue speculative retries to other bookie. It works because write will get blocked in PendingAddOp when it is submitting requests ```java for (int i = 0; i < writeSet.size(); i++) { sendWriteRequest(writeSet.get(i)); } ``` sendWriteRequest gets blocked if we block on netty. In our case app limits number of requests in flight. i.e. it can have 50 writes in flight and in the current ensemble 2 bookies able to handle this while the 3rd one is slow or goes through long GC. Without this change we end up submitting data to netty to all 3 bookies and submit more as soon as two of them ack the write. netty in this case keeps on buffering data for the 3rd bookie and finally we were getting OODME. With this change request ends up being blocked in sendWriteRequest to a slow bookie until it either succeeds or fails to submit (hence CHANNEL_WAIT_TIMEOUT_ON_WRITE to limit wait for writes specifically). The change does not help if app can submit unlimited number fo requests, I totally agree. I think that should be addressed in a separate change building on top of this one. There is also server side of the backpressure story not addressed in this change, specifically: - server has to stop accepting requests if it cannot process them fast enough - server has to do something if it cannot send responses to client fast enough (slow client case) -> either stop accepting requests, or drop responses, or combo of two p.1: I have comparison of throughput with different sizes of HWM for netty (LWM = HWM-1M). Without the change test failed with write error at anywhere between 35min and 58min. In this case I managed to run load overnight, no OODME, no write errors. ![hwatermark-tests](https://user-images.githubusercontent.com/8622884/35591468-e32b49e0-05be-11e8-88cc-ae59a909e278.png)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services