Re: [DISCUSS] KIP-124: Request rate quotas

Roger Hoover Wed, 22 Feb 2017 09:56:41 -0800

Great to see this KIP and the excellent discussion.

To me, Jun's suggestion makes sense.  If my application is allocated 1
request handler unit, then it's as if I have a Kafka broker with a single
request handler thread dedicated to me.  That's the most I can use, at
least.  That allocation doesn't change even if an admin later increases the
size of the request thread pool on the broker.  It's similar to the CPU
abstraction that VMs and containers get from hypervisors or OS schedulers.
While different client access patterns can use wildly different amounts of
request thread resources per request, a given application will generally
have a stable access pattern and can figure out empirically how many
"request thread units" it needs to meet it's throughput/latency goals.


Cheers,

Roger

On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao <j...@confluent.io> wrote:

> Hi, Rajini,
>
> Thanks for the updated KIP. A few more comments.
>
> 1. A concern of request_time_percent is that it's not an absolute value.
> Let's say you give a user a 10% limit. If the admin doubles the number of
> request handler threads, that user now actually has twice the absolute
> capacity. This may confuse people a bit. So, perhaps setting the quota
> based on an absolute request thread unit is better.
>
> 2. ControlledShutdownRequest is also an inter-broker request and needs to
> be excluded from throttling.
>
> 3. Implementation wise, I am wondering if it's simpler to apply the request
> time throttling first in KafkaApis.handle(). Otherwise, we will need to add
> the throttling logic in each type of request.
>
> Thanks,
>
> Jun
>
> On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
>
> > Jun,
> >
> > Thank you for the review.
> >
> > I have reverted to the original KIP that throttles based on request
> handler
> > utilization. At the moment, it uses percentage, but I am happy to change
> to
> > a fraction (out of 1 instead of 100) if required. I have added the
> examples
> > from this discussion to the KIP. Also added a "Future Work" section to
> > address network thread utilization. The configuration is named
> > "request_time_percent" with the expectation that it can also be used as
> the
> > limit for network thread utilization when that is implemented, so that
> > users have to set only one config for the two and not have to worry about
> > the internal distribution of the work between the two thread pools in
> > Kafka.
> >
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao <j...@confluent.io> wrote:
> >
> > > Hi, Rajini,
> > >
> > > Thanks for the proposal.
> > >
> > > The benefit of using the request processing time over the request rate
> is
> > > exactly what people have said. I will just expand that a bit. Consider
> > the
> > > following case. The producer sends a produce request with a 10MB
> message
> > > but compressed to 100KB with gzip. The decompression of the message on
> > the
> > > broker could take 10-15 seconds, during which time, a request handler
> > > thread is completely blocked. In this case, neither the byte-in quota
> nor
> > > the request rate quota may be effective in protecting the broker.
> > Consider
> > > another case. A consumer group starts with 10 instances and later on
> > > switches to 20 instances. The request rate will likely double, but the
> > > actually load on the broker may not double since each fetch request
> only
> > > contains half of the partitions. Request rate quota may not be easy to
> > > configure in this case.
> > >
> > > What we really want is to be able to prevent a client from using too
> much
> > > of the server side resources. In this particular KIP, this resource is
> > the
> > > capacity of the request handler threads. I agree that it may not be
> > > intuitive for the users to determine how to set the right limit.
> However,
> > > this is not completely new and has been done in the container world
> > > already. For example, Linux cgroup (https://access.redhat.com/
> > > documentation/en-US/Red_Hat_Enterprise_Linux/6/html/
> > > Resource_Management_Guide/sec-cpu.html) has the concept of
> > > cpu.cfs_quota_us,
> > > which specifies the total amount of time in microseconds for which all
> > > tasks in a cgroup can run during a one second period. We can
> potentially
> > > model the request handler threads in a similar way. For example, each
> > > request handler thread can be 1 request handler unit and the admin can
> > > configure a limit on how many units (say 0.01) a client can have.
> > >
> > > Regarding not throttling the internal broker to broker requests. We
> could
> > > do that. Alternatively, we could just let the admin configure a high
> > limit
> > > for the kafka user (it may not be able to do that easily based on
> > clientId
> > > though).
> > >
> > > Ideally we want to be able to protect the utilization of the network
> > thread
> > > pool too. The difficult is mostly what Rajini said: (1) The mechanism
> for
> > > throttling the requests is through Purgatory and we will have to think
> > > through how to integrate that into the network layer.  (2) In the
> network
> > > layer, currently we know the user, but not the clientId of the request.
> > So,
> > > it's a bit tricky to throttle based on clientId there. Plus, the
> byteOut
> > > quota can already protect the network thread utilization for fetch
> > > requests. So, if we can't figure out this part right now, just focusing
> > on
> > > the request handling threads for this KIP is still a useful feature.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Thank you all for the feedback.
> > > >
> > > > Jay: I have removed exemption for consumer heartbeat etc. Agree that
> > > > protecting the cluster is more important than protecting individual
> > apps.
> > > > Have retained the exemption for StopReplicat/LeaderAndIsr etc, these
> > are
> > > > throttled only if authorization fails (so can't be used for DoS
> attacks
> > > in
> > > > a secure cluster, but allows inter-broker requests to complete
> without
> > > > delays).
> > > >
> > > > I will wait another day to see if these is any objection to quotas
> > based
> > > on
> > > > request processing time (as opposed to request rate) and if there are
> > no
> > > > objections, I will revert to the original proposal with some changes.
> > > >
> > > > The original proposal was only including the time used by the request
> > > > handler threads (that made calculation easy). I think the suggestion
> is
> > > to
> > > > include the time spent in the network threads as well since that may
> be
> > > > significant. As Jay pointed out, it is more complicated to calculate
> > the
> > > > total available CPU time and convert to a ratio when there *m* I/O
> > > threads
> > > > and *n* network threads. ThreadMXBean#getThreadCPUTime() may give us
> > > what
> > > > we want, but it can be very expensive on some platforms. As Becket
> and
> > > > Guozhang have pointed out, we do have several time measurements
> already
> > > for
> > > > generating metrics that we could use, though we might want to switch
> to
> > > > nanoTime() instead of currentTimeMillis() since some of the values
> for
> > > > small requests may be < 1ms. But rather than add up the time spent in
> > I/O
> > > > thread and network thread, wouldn't it be better to convert the time
> > > spent
> > > > on each thread into a separate ratio? UserA has a request quota of
> 5%.
> > > Can
> > > > we take that to mean that UserA can use 5% of the time on network
> > threads
> > > > and 5% of the time on I/O threads? If either is exceeded, the
> response
> > is
> > > > throttled - it would mean maintaining two sets of metrics for the two
> > > > durations, but would result in more meaningful ratios. We could
> define
> > > two
> > > > quota limits (UserA has 5% of request threads and 10% of network
> > > threads),
> > > > but that seems unnecessary and harder to explain to users.
> > > >
> > > > Back to why and how quotas are applied to network thread utilization:
> > > > a) In the case of fetch,  the time spent in the network thread may be
> > > > significant and I can see the need to include this. Are there other
> > > > requests where the network thread utilization is significant? In the
> > case
> > > > of fetch, request handler thread utilization would throttle clients
> > with
> > > > high request rate, low data volume and fetch byte rate quota will
> > > throttle
> > > > clients with high data volume. Network thread utilization is perhaps
> > > > proportional to the data volume. I am wondering if we even need to
> > > throttle
> > > > based on network thread utilization or whether the data volume quota
> > > covers
> > > > this case.
> > > >
> > > > b) At the moment, we record and check for quota violation at the same
> > > time.
> > > > If a quota is violated, the response is delayed. Using Jay'e example
> of
> > > > disk reads for fetches happening in the network thread, We can't
> record
> > > and
> > > > delay a response after the disk reads. We could record the time spent
> > on
> > > > the network thread when the response is complete and introduce a
> delay
> > > for
> > > > handling a subsequent request (separate out recording and quota
> > violation
> > > > handling in the case of network thread overload). Does that make
> sense?
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > >
> > > > On Tue, Feb 21, 2017 at 2:58 AM, Becket Qin <becket....@gmail.com>
> > > wrote:
> > > >
> > > > > Hey Jay,
> > > > >
> > > > > Yeah, I agree that enforcing the CPU time is a little tricky. I am
> > > > thinking
> > > > > that maybe we can use the existing request statistics. They are
> > already
> > > > > very detailed so we can probably see the approximate CPU time from
> > it,
> > > > e.g.
> > > > > something like (total_time - request/response_queue_time -
> > > remote_time).
> > > > >
> > > > > I agree with Guozhang that when a user is throttled it is likely
> that
> > > we
> > > > > need to see if anything has went wrong first, and if the users are
> > well
> > > > > behaving and just need more resources, we will have to bump up the
> > > quota
> > > > > for them. It is true that pre-allocating CPU time quota precisely
> for
> > > the
> > > > > users is difficult. So in practice it would probably be more like
> > first
> > > > set
> > > > > a relative high protective CPU time quota for everyone and increase
> > > that
> > > > > for some individual clients on demand.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > >
> > > > > On Mon, Feb 20, 2017 at 5:48 PM, Guozhang Wang <wangg...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > This is a great proposal, glad to see it happening.
> > > > > >
> > > > > > I am inclined to the CPU throttling, or more specifically
> > processing
> > > > time
> > > > > > ratio instead of the request rate throttling as well. Becket has
> > very
> > > > > well
> > > > > > summed my rationales above, and one thing to add here is that the
> > > > former
> > > > > > has a good support for both "protecting against rogue clients" as
> > > well
> > > > as
> > > > > > "utilizing a cluster for multi-tenancy usage": when thinking
> about
> > > how
> > > > to
> > > > > > explain this to the end users, I find it actually more natural
> than
> > > the
> > > > > > request rate since as mentioned above, different requests will
> have
> > > > quite
> > > > > > different "cost", and Kafka today already have various request
> > types
> > > > > > (produce, fetch, admin, metadata, etc), because of that the
> request
> > > > rate
> > > > > > throttling may not be as effective unless it is set very
> > > > conservatively.
> > > > > >
> > > > > > Regarding to user reactions when they are throttled, I think it
> may
> > > > > differ
> > > > > > case-by-case, and need to be discovered / guided by looking at
> > > relative
> > > > > > metrics. So in other words users would not expect to get
> additional
> > > > > > information by simply being told "hey, you are throttled", which
> is
> > > all
> > > > > > what throttling does; they need to take a follow-up step and see
> > > "hmm,
> > > > > I'm
> > > > > > throttled probably because of ..", which is by looking at other
> > > metric
> > > > > > values: e.g. whether I'm bombarding the brokers with metadata
> > > request,
> > > > > > which are usually cheap to handle but I'm sending thousands per
> > > second;
> > > > > or
> > > > > > is it because I'm catching up and hence sending very heavy
> fetching
> > > > > request
> > > > > > with large min.bytes, etc.
> > > > > >
> > > > > > Regarding to the implementation, as once discussed with Jun, this
> > > seems
> > > > > not
> > > > > > very difficult since today we are already collecting the "thread
> > pool
> > > > > > utilization" metrics, which is a single percentage
> > > "aggregateIdleMeter"
> > > > > > value; but we are already effectively aggregating it for each
> > > requests
> > > > in
> > > > > > KafkaRequestHandler, and we can just extend it by recording the
> > > source
> > > > > > client id when handling them and aggregating by clientId as well
> as
> > > the
> > > > > > total aggregate.
> > > > > >
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps <j...@confluent.io>
> > wrote:
> > > > > >
> > > > > > > Hey Becket/Rajini,
> > > > > > >
> > > > > > > When I thought about it more deeply I came around to the
> "percent
> > > of
> > > > > > > processing time" metric too. It seems a lot closer to the thing
> > we
> > > > > > actually
> > > > > > > care about and need to protect. I also think this would be a
> very
> > > > > useful
> > > > > > > metric even in the absence of throttling just to debug whose
> > using
> > > > > > > capacity.
> > > > > > >
> > > > > > > Two problems to consider:
> > > > > > >
> > > > > > >    1. I agree that for the user it is understandable what lead
> to
> > > > their
> > > > > > >    being throttled, but it is a bit hard to figure out the safe
> > > range
> > > > > for
> > > > > > >    them. i.e. if I have a new app that will send 200
> > messages/sec I
> > > > can
> > > > > > >    probably reason that I'll be under the throttling limit of
> 300
> > > > > > req/sec.
> > > > > > >    However if I need to be under a 10% CPU resources limit it
> may
> > > be
> > > > a
> > > > > > bit
> > > > > > >    harder for me to know a priori if i will or won't.
> > > > > > >    2. Calculating the available CPU time is a bit difficult
> since
> > > > there
> > > > > > are
> > > > > > >    actually two thread pools--the I/O threads and the network
> > > > threads.
> > > > > I
> > > > > > > think
> > > > > > >    it might be workable to count just the I/O thread time as in
> > the
> > > > > > > proposal,
> > > > > > >    but the network thread work is actually non-trivial (e.g.
> all
> > > the
> > > > > disk
> > > > > > >    reads for fetches happen in that thread). If you count both
> > the
> > > > > > network
> > > > > > > and
> > > > > > >    I/O threads it can skew things a bit. E.g. say you have 50
> > > network
> > > > > > > threads,
> > > > > > >    10 I/O threads, and 8 cores, what is the available cpu time
> > > > > available
> > > > > > > in a
> > > > > > >    second? I suppose this is a problem whenever you have a
> > > bottleneck
> > > > > > > between
> > > > > > >    I/O and network threads or if you end up significantly
> > > > > > over-provisioning
> > > > > > >    one pool (both of which are hard to avoid).
> > > > > > >
> > > > > > > An alternative for CPU throttling would be to use this api:
> > > > > > > http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/
> > > > > > > management/ThreadMXBean.html#getThreadCpuTime(long)
> > > > > > >
> > > > > > > That would let you track actual CPU usage across the network,
> I/O
> > > > > > threads,
> > > > > > > and purgatory threads and look at it as a percentage of total
> > > cores.
> > > > I
> > > > > > > think this fixes many problems in the reliability of the
> metric.
> > > It's
> > > > > > > meaning is slightly different as it is just CPU (you don't get
> > > > charged
> > > > > > for
> > > > > > > time blocking on I/O) but that may be okay because we already
> > have
> > > a
> > > > > > > throttle on I/O. The downside is I think it is possible this
> api
> > > can
> > > > be
> > > > > > > disabled or isn't always available and it may also be expensive
> > > (also
> > > > > > I've
> > > > > > > never used it so not sure if it really works the way i think).
> > > > > > >
> > > > > > > -Jay
> > > > > > >
> > > > > > > On Mon, Feb 20, 2017 at 3:17 PM, Becket Qin <
> > becket....@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > If the purpose of the KIP is only to protect the cluster from
> > > being
> > > > > > > > overwhelmed by crazy clients and is not intended to address
> > > > resource
> > > > > > > > allocation problem among the clients, I am wondering if using
> > > > request
> > > > > > > > handling time quota (CPU time quota) is a better option. Here
> > are
> > > > the
> > > > > > > > reasons:
> > > > > > > >
> > > > > > > > 1. request handling time quota has better protection. Say we
> > have
> > > > > > request
> > > > > > > > rate quota and set that to some value like 100 requests/sec,
> it
> > > is
> > > > > > > possible
> > > > > > > > that some of the requests are very expensive actually take a
> > lot
> > > of
> > > > > > time
> > > > > > > to
> > > > > > > > handle. In that case a few clients may still occupy a lot of
> > CPU
> > > > time
> > > > > > > even
> > > > > > > > the request rate is low. Arguably we can carefully set
> request
> > > rate
> > > > > > quota
> > > > > > > > for each request and client id combination, but it could
> still
> > be
> > > > > > tricky
> > > > > > > to
> > > > > > > > get it right for everyone.
> > > > > > > >
> > > > > > > > If we use the request time handling quota, we can simply say
> no
> > > > > clients
> > > > > > > can
> > > > > > > > take up to more than 30% of the total request handling
> capacity
> > > > > > (measured
> > > > > > > > by time), regardless of the difference among different
> requests
> > > or
> > > > > what
> > > > > > > is
> > > > > > > > the client doing. In this case maybe we can quota all the
> > > requests
> > > > if
> > > > > > we
> > > > > > > > want to.
> > > > > > > >
> > > > > > > > 2. The main benefit of using request rate limit is that it
> > seems
> > > > more
> > > > > > > > intuitive. It is true that it is probably easier to explain
> to
> > > the
> > > > > user
> > > > > > > > what does that mean. However, in practice it looks the impact
> > of
> > > > > > request
> > > > > > > > rate quota is not more quantifiable than the request handling
> > > time
> > > > > > quota.
> > > > > > > > Unlike the byte rate quota, it is still difficult to give a
> > > number
> > > > > > about
> > > > > > > > impact of throughput or latency when a request rate quota is
> > hit.
> > > > So
> > > > > it
> > > > > > > is
> > > > > > > > not better than the request handling time quota. In fact I
> feel
> > > it
> > > > is
> > > > > > > > clearer to tell user that "you are limited because you have
> > taken
> > > > 30%
> > > > > > of
> > > > > > > > the CPU time on the broker" than otherwise something like
> "your
> > > > > request
> > > > > > > > rate quota on metadata request has reached".
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Feb 20, 2017 at 2:23 PM, Jay Kreps <j...@confluent.io
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > I think this proposal makes a lot of sense (especially now
> > that
> > > > it
> > > > > is
> > > > > > > > > oriented around request rate) and fills the biggest
> remaining
> > > gap
> > > > > in
> > > > > > > the
> > > > > > > > > multi-tenancy story.
> > > > > > > > >
> > > > > > > > > I think for intra-cluster communication (StopReplica, etc)
> we
> > > > could
> > > > > > > avoid
> > > > > > > > > throttling entirely. You can secure or otherwise lock-down
> > the
> > > > > > cluster
> > > > > > > > > communication to avoid any unauthorized external party from
> > > > trying
> > > > > to
> > > > > > > > > initiate these requests. As a result we are as likely to
> > cause
> > > > > > problems
> > > > > > > > as
> > > > > > > > > solve them by throttling these, right?
> > > > > > > > >
> > > > > > > > > I'm not so sure that we should exempt the consumer requests
> > > such
> > > > as
> > > > > > > > > heartbeat. It's true that if we throttle an app's heartbeat
> > > > > requests
> > > > > > it
> > > > > > > > may
> > > > > > > > > cause it to fall out of its consumer group. However if we
> > don't
> > > > > > > throttle
> > > > > > > > it
> > > > > > > > > it may DDOS the cluster if the heartbeat interval is set
> > > > > incorrectly
> > > > > > or
> > > > > > > > if
> > > > > > > > > some client in some language has a bug. I think the policy
> > with
> > > > > this
> > > > > > > kind
> > > > > > > > > of throttling is to protect the cluster above any
> individual
> > > app,
> > > > > > > right?
> > > > > > > > I
> > > > > > > > > think in general this should be okay since for most
> > deployments
> > > > > this
> > > > > > > > > setting is meant as more of a safety valve---that is rather
> > > than
> > > > > set
> > > > > > > > > something very close to what you expect to need (say 2
> > req/sec
> > > or
> > > > > > > > whatever)
> > > > > > > > > you would have something quite high (like 100 req/sec) with
> > > this
> > > > > > meant
> > > > > > > to
> > > > > > > > > prevent a client gone crazy. I think when used this way
> > > allowing
> > > > > > those
> > > > > > > to
> > > > > > > > > be throttled would actually provide meaningful protection.
> > > > > > > > >
> > > > > > > > > -Jay
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram <
> > > > > > > rajinisiva...@gmail.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I have just created KIP-124 to introduce request rate
> > quotas
> > > to
> > > > > > > Kafka:
> > > > > > > > > >
> > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > 124+-+Request+rate+quotas
> > > > > > > > > >
> > > > > > > > > > The proposal is for a simple percentage request handling
> > time
> > > > > quota
> > > > > > > > that
> > > > > > > > > > can be allocated to *<client-id>*, *<user>* or *<user,
> > > > > client-id>*.
> > > > > > > > There
> > > > > > > > > > are a few other suggestions also under "Rejected
> > > alternatives".
> > > > > > > > Feedback
> > > > > > > > > > and suggestions are welcome.
> > > > > > > > > >
> > > > > > > > > > Thank you...
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > Rajini
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-124: Request rate quotas

Reply via email to