Radai,

Thanks for the proposal. A couple of comments on this.

1. Since we store request objects in the request queue, how do we get an
accurate size estimate for those requests?

2. Currently, it's bad if the processor blocks on adding a request to the
request queue. Once blocked, the processor can't process the sending of
responses of other socket keys either. This will cause all clients in this
processor with an outstanding request to eventually timeout. Typically,
this will trigger client-side retries, which will add more load on the
broker and cause potentially more congestion in the request queue. With
queued.max.requests, to prevent blocking on the request queue, our
recommendation is to configure queued.max.requests to be the same as the
number of socket connections on the broker. Since the broker never
processes more than 1 request per connection at a time, the request queue
will never be blocked. With queued.max.bytes, it's going to be harder to
configure the value properly to prevent blocking.

So, while adding queued.max.bytes is potentially useful for memory
management, for it to be truly useful, we probably need to address the
processor blocking issue for it to be really useful in practice. One
possibility is to put back-pressure to the client when the request queue is
blocked. For example, if the processor notices that the request queue is
full, it can turn off the interest bit for read for all socket keys. This
will allow the processor to continue handling responses. When the request
queue has space again, it can indicate the new state to the process and
wake up the selector. Not sure how this will work with multiple processors
though since the request queue is shared across all processors.

Thanks,

Jun



On Thu, Aug 4, 2016 at 11:28 AM, radai <radai.rosenbl...@gmail.com> wrote:

> Hello,
>
> I'd like to initiate a discussion about
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 72%3A+Allow+Sizing+Incoming+Request+Queue+in+Bytes
>
> The goal of the KIP is to allow configuring a bound on the capacity (as in
> bytes of memory used) of the incoming request queue, in addition to the
> current bound on the number of messages.
>
> This comes after several incidents at Linkedin where a sudden "spike" of
> large message batches caused an out of memory exception.
>
> Thank you,
>
>    Radai
>

Reply via email to