Hi Anna, Thanks for the response, sounds good.
Regards, Rajini On Sun, May 17, 2020 at 1:38 AM Anna Povzner <a...@confluent.io> wrote: > Hi Rajini, > > Thanks for reviewing the KIP! > > I agree with your suggestion to make per-IP connection rate quota a dynamic > quota for entity name IP. This will allow configuring connection rate for a > particular IP as well. I updated the wiki accordingly. > > Your second concern makes sense -- rejecting the connection right away will > likely cause a new connection from the same client. I am concerned about > delaying new connections for processing later, because if the connections > keep coming with the high rate, there may be potentially a large backlog > and connections may start timing out before the broker gets to processing > them. For example, if clients come through proxy, there may be > potentially a large number of incoming connections with the same IP. > > What do you think about the following option: > * Once per-IP connection rate reaches the limit, accept or drop (clean up) > the connection after a delay depending on whether the quota is still > violated. We could re-use the mechanism implemented with KIP-306 where the > broker delays the response for failed client authentication. The delay will > be set to min(delay calculated based on the rate quota, 1 second), which > matches the max delay for request quota. > > I think this option is somewhat your suggestion with delaying accepting per > IP connections that reached the rate limit, but with protection in place to > make sure the number of delayed connections does not blow up. What do you > think? > > Thanks, > Anna > > On Sat, May 16, 2020 at 1:09 AM Alexandre Dupriez < > alexandre.dupr...@gmail.com> wrote: > > > Hi Anna, > > > > Thank you for your answers and explanations. > > > > A couple of additional comments: > > > > 900. KIP-612 does not intend to dedicate a metric to the throttling of > > incoming connections. I wonder if such a metric would be handy for > > monitoring and help set-up metric-based alarming if one wishes to > > capture this type of incident? > > > > 901. Following-up on Rajini's point 2 above - from my understanding, > > this new quota should prevent excess CPU consumption in > > Processor#run() method when a new connection has been accepted. > > Through the throttling in place, connections will be delayed as > > indicated by the KIP's specifications: > > > > " If connection creation rate on the broker exceeds the broker-wide > > limit, the broker will delay accepting a new connection by an amount > > of time that brings the rate within the limit." > > > > You may be referring to the following sentence: > > > > "A new broker configuration option will be added to limit the rate at > > which connections will be accepted for each IP address. New > > connections for the IP will be dropped once the limit is reached."? > > > > 902. It may be interesting to capture the data with and without > > connection throttling under stress scenarios. You may have these data > > already. If you need a pair of hands to do some stress tests once you > > have a POC or a PR, I am happy to contribute :) > > > > Thanks, > > Alexandre > > > > Le ven. 15 mai 2020 à 12:22, Rajini Sivaram <rajinisiva...@gmail.com> a > > écrit : > > > > > > Hi Anna, > > > > > > Thanks for the KIP, looks good overall. A couple of comments about > per-IP > > > connection quotas: > > > > > > 1) Should we consider making per-IP quota similar to other quotas? > > > Configured as a dynamic quota for entity type IP, with per-IP limit as > > well > > > as defaults? Perhaps that would fit better rather than configs? > > > > > > 2) The current proposal drops connections after accepting connections > for > > > per-IP limit. We do this in other cases too, but in this case, should > we > > > throttle instead? My point is what is the quota protecting? If we want > to > > > limit rate of accepted connections, then accepting a connection and > then > > > dropping doesn't really help since that IP is going to reconnect. If we > > > want to rate limit what happens next, i.e. authentication, then > > > throttling the accepted connection so its processing is delayed would > > > perhaps be better? > > > > > > Regards, > > > > > > Rajini > > > > > > On Thu, May 14, 2020 at 4:12 PM David Jacot <dja...@confluent.io> > wrote: > > > > > > > Hi Anna, > > > > > > > > Thanks for your answers and the updated KIP. Looks good to me! > > > > > > > > Best, > > > > David > > > > > > > > On Thu, May 14, 2020 at 12:54 AM Anna Povzner <a...@confluent.io> > > wrote: > > > > > > > > > I updated the KIP to add a new broker configuration to limit > > connection > > > > > creation rate per IP: max.connection.creation.rate.per.ip. Once the > > limit > > > > > is reached for a particular IP address, the broker will reject the > > > > > connection from that IP (close the connection it accepted) and > > continue > > > > > rejecting them until the rate is back within the rate limit. > > > > > > > > > > On Wed, May 13, 2020 at 11:46 AM Anna Povzner <a...@confluent.io> > > wrote: > > > > > > > > > > > Hi David and Alexandre, > > > > > > > > > > > > Thanks so much for your feedback! Here are my answers: > > > > > > > > > > > > 1. Yes, we have seen several cases of clients that create a new > > > > > connection > > > > > > per produce/consume request. One hypothesis is someone who is > used > > to > > > > > > connection pooling may accidentally write a Kafka client that > > creates a > > > > > new > > > > > > connection every time. > > > > > > > > > > > > 2 & 4. That's a good point I haven't considered. I think it makes > > sense > > > > > to > > > > > > provide an ability to limit connection creations per IP as well. > > This > > > > is > > > > > > not hard to implement -- the broker already keeps track of the > > number > > > > of > > > > > > connections per IP, and immediately closes a new connection if it > > comes > > > > > > from an IP that reached the connection limit. So, we could > > additionally > > > > > > track the rate, and close the connection from IP that exceeds the > > rate. > > > > > One > > > > > > slight concern is whether keeping track of per IP rates and > quotas > > adds > > > > > > overhead (CPU and memory). But perhaps it is not a problem if we > > use > > > > > > expiring sensors. > > > > > > > > > > > > It would still make sense to limit the overall connection > creation > > rate > > > > > > for the Kafka clusters which are shared among many different > > > > > > applications/clients, since they may spike at the same time > > bringing > > > > the > > > > > > total rate too high. > > > > > > > > > > > > 3. Controlling connection queue sizes only controls the share of > > time > > > > > > network threads use for creating new connections (and accepting > on > > > > > Acceptor > > > > > > thread) vs. doing other work on each Processor iteration. It does > > not > > > > > > directly control how processing connection creations would be > > related > > > > to > > > > > > other processing done by brokers like on request handler threads. > > So, > > > > > while > > > > > > controlling queue size may mitigate the issue for some of the > > > > workloads, > > > > > it > > > > > > does not guarantee that. Plus, if we want to limit how many > > connections > > > > > are > > > > > > created per IP, the queue size approach would not work, unless we > > go > > > > > with a > > > > > > "share" of the queue, which I think even further obscures what > that > > > > > setting > > > > > > means (and what we would achieve as an end result). Does this > > answer > > > > the > > > > > > question? > > > > > > > > > > > > If there are no objections, I will update the KIP to add per IP > > > > > connection > > > > > > rate limits (config and enforcement). > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Anna > > > > > > > > > > > > > > > > > > On Tue, May 12, 2020 at 11:25 AM Alexandre Dupriez < > > > > > > alexandre.dupr...@gmail.com> wrote: > > > > > > > > > > > >> Hello, > > > > > >> > > > > > >> Thank you for the KIP. > > > > > >> > > > > > >> I experienced in the past genuine broker brownouts due to > > connection > > > > > >> storms consuming most of the CPU available on the server and I > > think > > > > > >> it is useful to protect against it. > > > > > >> > > > > > >> I tend to share the questions asked in points 2 and 4 from > David. > > Is > > > > > >> there still a risk of denial of service if the limit applies at > > the > > > > > >> listener-level without differentiating between (an) “offending” > > > > > >> client(s) and the others? > > > > > >> > > > > > >> To rebound on point 3 - conceptually one difference between > > capping > > > > > >> the queue size or throttling as presented in the KIP would come > > from > > > > > >> the time it takes to accept a connection and how that time > evolves > > > > > >> with the connection rate. > > > > > >> Assuming that that time increases monotonically with resource > > > > > >> utilization, the admissible rate of connections would decrease > as > > the > > > > > >> server becomes more loaded, if the limit was set on queue size. > > > > > >> > > > > > >> Thanks, > > > > > >> Alexandre > > > > > >> > > > > > >> Le mar. 12 mai 2020 à 08:49, David Jacot <dja...@confluent.io> > a > > > > écrit > > > > > : > > > > > >> > > > > > > >> > Hi Anna, > > > > > >> > > > > > > >> > Thanks for the KIP! I have few questions: > > > > > >> > > > > > > >> > 1. You mention that some clients may create a new connections > > for > > > > each > > > > > >> > requests: "Another example is clients that create a new > > connection > > > > for > > > > > >> each > > > > > >> > produce/consume request". I am curious here but do we know any > > > > clients > > > > > >> > behaving like this? > > > > > >> > > > > > > >> > 2. I am a bit concerned by the impact of misbehaving clients > on > > the > > > > > >> other > > > > > >> > ones. Let's say that we define a quota of 10 connections / sec > > for a > > > > > >> broker > > > > > >> > and that we have a misbehaving application constantly trying > to > > > > create > > > > > >> 20 > > > > > >> > connections on that broker. That application will constantly > > hit the > > > > > >> quota > > > > > >> > and > > > > > >> > always have many pending connections in the queue waiting to > be > > > > > >> accepted. > > > > > >> > Regular clients trying to connect would need to wait until all > > the > > > > > >> pending > > > > > >> > connections upfront in the queue are drained in the best case > > > > scenario > > > > > >> or > > > > > >> > won't be able to connect at all in the worst case scenario if > > the > > > > > queue > > > > > >> is > > > > > >> > full. > > > > > >> > Does it sound like a valid concern? How do you see this? > > > > > >> > > > > > > >> > 3. As you mention it in the KIP, we use bounded queues which > > already > > > > > >> limit > > > > > >> > the maximum number of connections that can be accepted. I > > wonder if > > > > we > > > > > >> > could reach the same goal by making the size of the queue > > > > > configurable. > > > > > >> > > > > > > >> > 4. Did you consider doing something similar to the connections > > quota > > > > > >> which > > > > > >> > limits the number of connections per IP? Instead of rate > > limiting > > > > all > > > > > >> the > > > > > >> > creation, > > > > > >> > we could perhaps rate limit the number of creation per IP as > > well. > > > > > That > > > > > >> > could > > > > > >> > perhaps reduce the effect on the other clients. That may be > > harder > > > > to > > > > > >> > implement > > > > > >> > though. > > > > > >> > > > > > > >> > Best, > > > > > >> > David > > > > > >> > > > > > > >> > On Mon, May 11, 2020 at 7:58 PM Anna Povzner < > a...@confluent.io > > > > > > > > wrote: > > > > > >> > > > > > > >> > > Hi, > > > > > >> > > > > > > > >> > > I just created KIP-612 to allow limiting connection creation > > rate > > > > on > > > > > >> > > brokers, and would like to start a discussion. > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-612%3A+Ability+to+Limit+Connection+Creation+Rate+on+Brokers > > > > > >> > > > > > > > >> > > Feedback and suggestions are welcome! > > > > > >> > > > > > > > >> > > Thanks, > > > > > >> > > Anna > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > >