I like the simplicity of the approach and can see that it is an improvement
over the current implementation in typical scenarios. But I would like to
see Jun's proposal to mute sockets explored further. With the proposal in
the KIP to limit queue size, I am not sure how to calculate the total
memory requirements for brokers to avoid OOM, particularly to avoid DoS
attacks in a secure cluster. Even though the KIP says, the actual memory
bound is "queued.max.bytes + socket.request.max.bytes",  it seems to me
that even with the queue limits, brokers will need close to
(socket.request.max.bytes
* n) bytes of memory to guarantee they don't OOM (where n is the number of
connections). That upper limit is no different to the current
implementation. With client quotas that delay responses and client requests
that may be retried due to blocking processors, as soon as there is space
in the request queue that unblocks the processor, the next select operation
could read in a lot of data from a lot of connections. So controlling the
amount of data read from the socket could be as important as controlling
the request queue size.

Don't mind this being done in two stages, but it will be good to understand
what the final solution would look like before this one is committed.


On Tue, Aug 9, 2016 at 1:58 AM, Becket Qin <becket....@gmail.com> wrote:

> Thought about this again. If I understand correctly Jun's concern is about
> the cascading effect. Currently the processor will try to put all the
> requests received in one poll() call into the RequestChannel. This could
> potentially be long if the queue is moving really really slowly. If we
> don't mute the sockets and clients time out then reconnect, the processor
> may read more requests from the new sockets and take even longer to put
> them into the RequestChannel. So it would be good see if we can avoid this
> cascading effects.
>
> On the other end, allowing the processor to keep reading requests from the
> sockets when the RequestChannel is full also seems hurting the memory usage
> control effort.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Aug 8, 2016 at 4:46 PM, radai <radai.rosenbl...@gmail.com> wrote:
>
> > I agree that filling up the request queue can cause clients to time out
> > (and presumably retry?). However, for the workloads where we expect this
> > configuration to be useful the alternative is currently an OOM crash.
> > In my opinion an initial implementation of this feature could be
> > constrained to a simple drop-in replacement of ArrayBlockingQueue
> > (conditional, opt-in) and further study of behavior patterns under load
> can
> > drive future changes to the API later when those behaviors are better
> > understood (like back-pressure, nop filler responses to avoid client
> > timeouts or whatever).
> >
> > On Mon, Aug 8, 2016 at 2:23 PM, Mayuresh Gharat <
> > gharatmayures...@gmail.com>
> > wrote:
> >
> > > Nice write up Radai.
> > > I think what Jun said is a valid concern.
> > > If I am not wrong as per the proposal, we are depending on the entire
> > > pipeline to flow smoothly from accepting requests to handling it,
> calling
> > > KafkaApis and handing back the responses.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > >
> > > On Mon, Aug 8, 2016 at 12:22 PM, Joel Koshy <jjkosh...@gmail.com>
> wrote:
> > >
> > > > >
> > > > > .
> > > > >>
> > > > >>
> > > > > Hi Becket,
> > > > >
> > > > > I don't think progress can be made in the processor's run loop if
> the
> > > > > queue fills up. i.e., I think Jun's point is that if the queue is
> > full
> > > > > (either due to the proposed max.bytes or today due to max.requests
> > > > hitting
> > > > > the limit) then processCompletedReceives will block and no further
> > > > progress
> > > > > can be made.
> > > > >
> > > >
> > > > I'm sorry - this isn't right. There will be progress as long as the
> API
> > > > handlers are able to pick requests off the request queue and add the
> > > > responses to the response queues (which are effectively unbounded).
> > > > However, the point is valid that blocking in the request channel's
> put
> > > has
> > > > the effect of exacerbating the pressure on the socket server.
> > > >
> > > >
> > > > >
> > > > >>
> > > > >> On Mon, Aug 8, 2016 at 10:04 AM, Jun Rao <j...@confluent.io>
> wrote:
> > > > >>
> > > > >> > Radai,
> > > > >> >
> > > > >> > Thanks for the proposal. A couple of comments on this.
> > > > >> >
> > > > >> > 1. Since we store request objects in the request queue, how do
> we
> > > get
> > > > an
> > > > >> > accurate size estimate for those requests?
> > > > >> >
> > > > >> > 2. Currently, it's bad if the processor blocks on adding a
> request
> > > to
> > > > >> the
> > > > >> > request queue. Once blocked, the processor can't process the
> > sending
> > > > of
> > > > >> > responses of other socket keys either. This will cause all
> clients
> > > in
> > > > >> this
> > > > >> > processor with an outstanding request to eventually timeout.
> > > > Typically,
> > > > >> > this will trigger client-side retries, which will add more load
> on
> > > the
> > > > >> > broker and cause potentially more congestion in the request
> queue.
> > > > With
> > > > >> > queued.max.requests, to prevent blocking on the request queue,
> our
> > > > >> > recommendation is to configure queued.max.requests to be the
> same
> > as
> > > > the
> > > > >> > number of socket connections on the broker. Since the broker
> never
> > > > >> > processes more than 1 request per connection at a time, the
> > request
> > > > >> queue
> > > > >> > will never be blocked. With queued.max.bytes, it's going to be
> > > harder
> > > > to
> > > > >> > configure the value properly to prevent blocking.
> > > > >> >
> > > > >> > So, while adding queued.max.bytes is potentially useful for
> memory
> > > > >> > management, for it to be truly useful, we probably need to
> address
> > > the
> > > > >> > processor blocking issue for it to be really useful in practice.
> > One
> > > > >> > possibility is to put back-pressure to the client when the
> request
> > > > >> queue is
> > > > >> > blocked. For example, if the processor notices that the request
> > > queue
> > > > is
> > > > >> > full, it can turn off the interest bit for read for all socket
> > keys.
> > > > >> This
> > > > >> > will allow the processor to continue handling responses. When
> the
> > > > >> request
> > > > >> > queue has space again, it can indicate the new state to the
> > process
> > > > and
> > > > >> > wake up the selector. Not sure how this will work with multiple
> > > > >> processors
> > > > >> > though since the request queue is shared across all processors.
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Jun
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Thu, Aug 4, 2016 at 11:28 AM, radai <
> > radai.rosenbl...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hello,
> > > > >> > >
> > > > >> > > I'd like to initiate a discussion about
> > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > >> > > 72%3A+Allow+Sizing+Incoming+Request+Queue+in+Bytes
> > > > >> > >
> > > > >> > > The goal of the KIP is to allow configuring a bound on the
> > > capacity
> > > > >> (as
> > > > >> > in
> > > > >> > > bytes of memory used) of the incoming request queue, in
> addition
> > > to
> > > > >> the
> > > > >> > > current bound on the number of messages.
> > > > >> > >
> > > > >> > > This comes after several incidents at Linkedin where a sudden
> > > > "spike"
> > > > >> of
> > > > >> > > large message batches caused an out of memory exception.
> > > > >> > >
> > > > >> > > Thank you,
> > > > >> > >
> > > > >> > >    Radai
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -Regards,
> > > Mayuresh R. Gharat
> > > (862) 250-7125
> > >
> >
>



-- 
Regards,

Rajini

Reply via email to