> We should work on some kind of back-pressure mechanism for the client, but I 
> am not sure about which kind of support we should provide at BK level

Is there an issue for this? If there isn't, then perhaps we can start that way.

> And as soon as the application is notified of the result of the write 
> (success or failure) we are releasing the reference to the payload (as I have 
> shown in this email thread),
> so in theory the application has full control over the retained memory and it 
> can apply its own memory management mechanisms 

Say we have a ledger for which WQ > AQ. Are you saying that if AQ bookies 
reply, then it is possible that the entry is not going to be written to |WQ| - 
|AQ| bookies because the entry data might have been reclaimed by the 
application? The contract as I understand it is that an entry is to be 
replicated |WQ| ways, even though the application is willing to receive a 
confirmation after |AQ| bookie responses.

What am I missing?

-Flavio  

> On 13 Jan 2021, at 11:30, Enrico Olivelli <eolive...@gmail.com> wrote:
> 
> Flavio
> 
> Il giorno mar 12 gen 2021 alle ore 17:26 Flavio Junqueira <f...@apache.org 
> <mailto:f...@apache.org>> ha scritto:
> I have observed the issue that Matteo describes and I also attributed the 
> problem to the absence of a back pressure mechanism in the client. Issue 
> #2497 was not about that, though. There was some corruption going on that was 
> leading to the server receiving garbage.
> 
> Correct, #2497 is not about the topic of this email, I just mentioned it 
> because the discussion started from that comment from Matteo.
> 
> We should work on some kind of back-pressure mechanism for the client, but I 
> am not sure about which kind of support we should provide at BK level
> 
> Regarding the writer side of this story and memory usage,
> we are not performing copies of the original payload that the caller is 
> passing, in case of a ByteBuf
> see PendingAddOp
> https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L263
>  
> <https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L263>
> and here, we simply wrap it in a ByteBufList 
> https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/checksum/DigestManager.java#L116
>  
> <https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/checksum/DigestManager.java#L116>
> 
> And as soon as the application is notified of the result of the write 
> (success or failure) we are releasing the reference to the payload (as I have 
> shown in this email thread),
> so in theory the application has full control over the retained memory
> and it can apply its own memory management mechanisms
> 
> 
> Enrico
> 
> 
> -Flavio 
> 
> > On 8 Jan 2021, at 22:47, Matteo Merli <mme...@apache.org 
> > <mailto:mme...@apache.org>> wrote:
> > 
> > On Fri, Jan 8, 2021 at 8:27 AM Enrico Olivelli <eolive...@gmail.com 
> > <mailto:eolive...@gmail.com>> wrote:
> >> 
> >> Hi Matteo,
> >> in this comment you are talking about an issue you saw when WQ is greater 
> >> that AQ
> >> https://github.com/apache/bookkeeper/issues/2497#issuecomment-734423246 
> >> <https://github.com/apache/bookkeeper/issues/2497#issuecomment-734423246>
> >> 
> >> IIUC you are saying that if one bookie is slow the client continues to 
> >> accumulate references to the entries that still have not received the 
> >> confirmation from it.
> >> I think that this is correct.
> >> 
> >> Have you seen problems in production related to this scenario ?
> >> Can you tell more about them ?
> > 
> > Yes, for simplicity, assume e=3, w=3, a=2.
> > 
> > If one bookie is slow (not down, just slow), the BK client will the
> > acks to the user that the entries are written after the first 2 acks.
> > In the meantime, it will keep waiting for the 3rd bookie to respond.
> > If the bookie responds within the timeout, the entries can now be
> > dropped from memory, otherwise the write will timeout internally and
> > it will get replayed to a new bookie.
> > 
> > In both cases, the amount of memory used in the client will max at
> > "throughput" * "timeout". This can be a large amount of memory and
> > easily cause OOM errors.
> > 
> > Part of the problem is that it cannot be solved from outside the BK
> > client, since there's no visibility on what entries have 2 or 3 acks
> > and therefore it's not possible to apply backpressure. Instead,
> > there should be a backpressure mechanism in the BK client itself to
> > prevent this kind of issue.
> > One possibility there could be to use the same approach as described
> > in 
> > https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits 
> > <https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits>,
> > giving a max memory limit per BK client instance and throttling
> > everything after the quota is reached.
> > 
> > 
> > Matteo
> 

Reply via email to