Hi Rajini,

Thanks for reviewing the KIP!

I agree with your suggestion to make per-IP connection rate quota a dynamic
quota for entity name IP. This will allow configuring connection rate for a
particular IP as well. I updated the wiki accordingly.

Your second concern makes sense -- rejecting the connection right away will
likely cause a new connection from the same client. I am concerned about
delaying new connections for processing later, because if the connections
keep coming with the high rate, there may be potentially a large backlog
and connections may start timing out before the broker gets to processing
them. For example, if clients come through proxy, there may be
potentially a large number of incoming connections with the same IP.

What do you think about the following option:
* Once per-IP connection rate reaches the limit, accept or drop (clean up)
the connection after a delay depending on whether the quota is still
violated. We could re-use the mechanism implemented with KIP-306 where the
broker delays the response for failed client authentication. The delay will
be set to min(delay calculated based on the rate quota, 1 second), which
matches the max delay for request quota.

I think this option is somewhat your suggestion with delaying accepting per
IP connections that reached the rate limit, but with protection in place to
make sure the number of delayed connections does not blow up. What do you
think?

Thanks,
Anna

On Sat, May 16, 2020 at 1:09 AM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi Anna,
>
> Thank you for your answers and explanations.
>
> A couple of additional comments:
>
> 900. KIP-612 does not intend to dedicate a metric to the throttling of
> incoming connections. I wonder if such a metric would be handy for
> monitoring and help set-up metric-based alarming if one wishes to
> capture this type of incident?
>
> 901. Following-up on Rajini's point 2 above - from my understanding,
> this new quota should prevent excess CPU consumption in
> Processor#run() method when a new connection has been accepted.
> Through the throttling in place, connections will be delayed as
> indicated by the KIP's specifications:
>
> " If connection creation rate on the broker exceeds the broker-wide
> limit, the broker will delay accepting a new connection by an amount
> of time that brings the rate within the limit."
>
> You may be referring to the following sentence:
>
> "A new broker configuration option will be added to limit the rate at
> which connections will be accepted for each IP address. New
> connections for the IP will be dropped once the limit is reached."?
>
> 902. It may be interesting to capture the data with and without
> connection throttling under stress scenarios. You may have these data
> already. If you need a pair of hands to do some stress tests once you
> have a POC or a PR, I am happy to contribute :)
>
> Thanks,
> Alexandre
>
> Le ven. 15 mai 2020 à 12:22, Rajini Sivaram <rajinisiva...@gmail.com> a
> écrit :
> >
> > Hi Anna,
> >
> > Thanks for the KIP, looks good overall. A couple of comments about per-IP
> > connection quotas:
> >
> > 1) Should we consider making per-IP quota similar to other quotas?
> > Configured as a dynamic quota for entity type IP, with per-IP limit as
> well
> > as defaults? Perhaps that would fit better rather than configs?
> >
> > 2) The current proposal drops connections after accepting connections for
> > per-IP limit. We do this in other cases too, but in this case, should we
> > throttle instead? My point is what is the quota protecting? If we want to
> > limit rate of accepted connections, then accepting a connection and then
> > dropping doesn't really help since that IP is going to reconnect. If we
> > want to rate limit what happens next, i.e. authentication, then
> > throttling the accepted connection so its processing is delayed would
> > perhaps be better?
> >
> > Regards,
> >
> > Rajini
> >
> > On Thu, May 14, 2020 at 4:12 PM David Jacot <dja...@confluent.io> wrote:
> >
> > > Hi Anna,
> > >
> > > Thanks for your answers and the updated KIP. Looks good to me!
> > >
> > > Best,
> > > David
> > >
> > > On Thu, May 14, 2020 at 12:54 AM Anna Povzner <a...@confluent.io>
> wrote:
> > >
> > > > I updated the KIP to add a new broker configuration to limit
> connection
> > > > creation rate per IP: max.connection.creation.rate.per.ip. Once the
> limit
> > > > is reached for a particular IP address, the broker will reject the
> > > > connection from that IP (close the connection it accepted) and
> continue
> > > > rejecting them until the rate is back within the rate limit.
> > > >
> > > > On Wed, May 13, 2020 at 11:46 AM Anna Povzner <a...@confluent.io>
> wrote:
> > > >
> > > > > Hi David and Alexandre,
> > > > >
> > > > > Thanks so much for your feedback! Here are my answers:
> > > > >
> > > > > 1. Yes, we have seen several cases of clients that create a new
> > > > connection
> > > > > per produce/consume request. One hypothesis is someone who is used
> to
> > > > > connection pooling may accidentally write a Kafka client that
> creates a
> > > > new
> > > > > connection every time.
> > > > >
> > > > > 2 & 4. That's a good point I haven't considered. I think it makes
> sense
> > > > to
> > > > > provide an ability to limit connection creations per IP as well.
> This
> > > is
> > > > > not hard to implement -- the broker already keeps track of the
> number
> > > of
> > > > > connections per IP, and immediately closes a new connection if it
> comes
> > > > > from an IP that reached the connection limit. So, we could
> additionally
> > > > > track the rate, and close the connection from IP that exceeds the
> rate.
> > > > One
> > > > > slight concern is whether keeping track of per IP rates and quotas
> adds
> > > > > overhead (CPU and memory). But perhaps it is not a problem if we
> use
> > > > > expiring sensors.
> > > > >
> > > > > It would still make sense to limit the overall connection creation
> rate
> > > > > for the Kafka clusters which are shared among many different
> > > > > applications/clients, since they may spike at the same time
> bringing
> > > the
> > > > > total rate too high.
> > > > >
> > > > > 3. Controlling connection queue sizes only controls the share of
> time
> > > > > network threads use for creating new connections (and accepting on
> > > > Acceptor
> > > > > thread) vs. doing other work on each Processor iteration. It does
> not
> > > > > directly control how processing connection creations would be
> related
> > > to
> > > > > other processing done by brokers like on request handler threads.
> So,
> > > > while
> > > > > controlling queue size may mitigate the issue for some of the
> > > workloads,
> > > > it
> > > > > does not guarantee that. Plus, if we want to limit how many
> connections
> > > > are
> > > > > created per IP, the queue size approach would not work, unless we
> go
> > > > with a
> > > > > "share" of the queue, which I think even further obscures what that
> > > > setting
> > > > > means (and what we would achieve as an end result). Does this
> answer
> > > the
> > > > > question?
> > > > >
> > > > > If there are no objections, I will update the KIP to add per IP
> > > > connection
> > > > > rate limits (config and enforcement).
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Anna
> > > > >
> > > > >
> > > > > On Tue, May 12, 2020 at 11:25 AM Alexandre Dupriez <
> > > > > alexandre.dupr...@gmail.com> wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> Thank you for the KIP.
> > > > >>
> > > > >> I experienced in the past genuine broker brownouts due to
> connection
> > > > >> storms consuming most of the CPU available on the server and I
> think
> > > > >> it is useful to protect against it.
> > > > >>
> > > > >> I tend to share the questions asked in points 2 and 4 from David.
> Is
> > > > >> there still a risk of denial of service if the limit applies at
> the
> > > > >> listener-level without differentiating between (an) “offending”
> > > > >> client(s) and the others?
> > > > >>
> > > > >> To rebound on point 3 - conceptually one difference between
> capping
> > > > >> the queue size or throttling as presented in the KIP would come
> from
> > > > >> the time it takes to accept a connection and how that time evolves
> > > > >> with the connection rate.
> > > > >> Assuming that that time increases monotonically with resource
> > > > >> utilization, the admissible rate of connections would decrease as
> the
> > > > >> server becomes more loaded, if the limit was set on queue size.
> > > > >>
> > > > >> Thanks,
> > > > >> Alexandre
> > > > >>
> > > > >> Le mar. 12 mai 2020 à 08:49, David Jacot <dja...@confluent.io> a
> > > écrit
> > > > :
> > > > >> >
> > > > >> > Hi Anna,
> > > > >> >
> > > > >> > Thanks for the KIP! I have few questions:
> > > > >> >
> > > > >> > 1. You mention that some clients may create a new connections
> for
> > > each
> > > > >> > requests: "Another example is clients that create a new
> connection
> > > for
> > > > >> each
> > > > >> > produce/consume request". I am curious here but do we know any
> > > clients
> > > > >> > behaving like this?
> > > > >> >
> > > > >> > 2. I am a bit concerned by the impact of misbehaving clients on
> the
> > > > >> other
> > > > >> > ones. Let's say that we define a quota of 10 connections / sec
> for a
> > > > >> broker
> > > > >> > and that we have a misbehaving application constantly trying to
> > > create
> > > > >> 20
> > > > >> > connections on that broker. That application will constantly
> hit the
> > > > >> quota
> > > > >> > and
> > > > >> > always have many pending connections in the queue waiting to be
> > > > >> accepted.
> > > > >> > Regular clients trying to connect would need to wait until all
> the
> > > > >> pending
> > > > >> > connections upfront in the queue are drained in the best case
> > > scenario
> > > > >> or
> > > > >> > won't be able to connect at all in the worst case scenario if
> the
> > > > queue
> > > > >> is
> > > > >> > full.
> > > > >> > Does it sound like a valid concern? How do you see this?
> > > > >> >
> > > > >> > 3. As you mention it in the KIP, we use bounded queues which
> already
> > > > >> limit
> > > > >> > the maximum number of connections that can be accepted. I
> wonder if
> > > we
> > > > >> > could reach the same goal by making the size of the queue
> > > > configurable.
> > > > >> >
> > > > >> > 4. Did you consider doing something similar to the connections
> quota
> > > > >> which
> > > > >> > limits the number of connections per IP? Instead of rate
> limiting
> > > all
> > > > >> the
> > > > >> > creation,
> > > > >> > we could perhaps rate limit the number of creation per IP as
> well.
> > > > That
> > > > >> > could
> > > > >> > perhaps reduce the effect on the other clients. That may be
> harder
> > > to
> > > > >> > implement
> > > > >> > though.
> > > > >> >
> > > > >> > Best,
> > > > >> > David
> > > > >> >
> > > > >> > On Mon, May 11, 2020 at 7:58 PM Anna Povzner <a...@confluent.io
> >
> > > > wrote:
> > > > >> >
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > I just created KIP-612 to allow limiting connection creation
> rate
> > > on
> > > > >> > > brokers, and would like to start a discussion.
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-612%3A+Ability+to+Limit+Connection+Creation+Rate+on+Brokers
> > > > >> > >
> > > > >> > > Feedback and suggestions are welcome!
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Anna
> > > > >> > >
> > > > >>
> > > > >
> > > >
> > >
>

Reply via email to