Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

Jun Rao Fri, 05 Jun 2020 16:37:20 -0700

Hi, Anna,

Thanks for the comment. For the problem that you described, perhaps we need
to make the quota checking and recording more atomic?


Hi, David,

Thanks for the updated KIP.  Looks good to me now. Just one minor comment
below.

30. controller_mutations_rate: For topic creation and deletion, is the rate
accumulated at the topic or partition level? It would be useful to make it
clear in the wiki.

Jun

On Fri, Jun 5, 2020 at 7:23 AM David Jacot <[email protected]> wrote:

> Hi Anna and Jun,
>
> You are right. We should allocate up to the quota for each old sample.
>
> I have revamped the Throttling Algorithm section to better explain our
> thought process and the token bucket inspiration.
>
> I have also added a chapter with few guidelines about how to define
> the quota. There is no magic formula for this but I give few insights.
> I don't have specific numbers that can be used out of the box so I
> think that it is better to not put any for the time being. We can always
> complement later on in the documentation.
>
> Please, take a look and let me know what you think.
>
> Cheers,
> David
>
> On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner <[email protected]> wrote:
>
> > Hi David and Jun,
> >
> > I dug a bit deeper into the Rate implementation, and wanted to confirm
> that
> > I do believe that the token bucket behavior is better for the reasons we
> > already discussed but wanted to summarize. The main difference between
> Rate
> > and token bucket is that the Rate implementation allows a burst by
> > borrowing from the future, whereas a token bucket allows a burst by using
> > accumulated tokens from the previous idle period. Using accumulated
> tokens
> > smoothes out the rate measurement in general. Configuring a large burst
> > requires configuring a large quota window, which causes long delays for
> > bursty workload, due to borrowing credits from the future. Perhaps it is
> > useful to add a summary in the beginning of the Throttling Algorithm
> > section?
> >
> > In my previous email, I mentioned the issue we observed with the
> bandwidth
> > quota, where a low quota (1MB/s per broker) was limiting bandwidth
> visibly
> > below the quota. I thought it was strictly the issue with the Rate
> > implementation as well, but I found a root cause to be different but
> > amplified by the Rate implementation (long throttle delays of requests
> in a
> > burst). I will describe it here for completeness using the following
> > example:
> >
> >    -
> >
> >    Quota = 1MB/s, default window size and number of samples
> >    -
> >
> >    Suppose there are 6 connections (maximum 6 outstanding requests), and
> >    each produce request is 5MB. If all requests arrive in a burst, the
> > last 4
> >    requests (20MB over 10MB allowed in a window) may get the same
> throttle
> >    time if they are processed concurrently. We record the rate under the
> > lock,
> >    but then calculate throttle time separately after that. So, for each
> >    request, the observed rate could be 3MB/s, and each request gets
> > throttle
> >    delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The delay
> is
> >    longer than the total rate window, which results in lower bandwidth
> than
> >    the quota. Since all requests got the same delay, they will also
> arrive
> > in
> >    a burst, which may also result in longer delay than necessary. It
> looks
> >    pretty easy to fix, so I will open a separate JIRA for it. This can be
> >    additionally mitigated by token bucket behavior.
> >
> >
> > For the algorithm "So instead of having one sample equal to 560 in the
> last
> > window, we will have 100 samples equal to 5.6.", I agree with Jun. I
> would
> > allocate 5 per each old sample that is still in the overall window. It
> > would be a bit larger granularity than the pure token bucket (we lose 5
> > units / mutation once we move past the sample window), but it is better
> > than the long delay.
> >
> > Thanks,
> >
> > Anna
> >
> >
> > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao <[email protected]> wrote:
> >
> > > Hi, David, Anna,
> > >
> > > Thanks for the discussion and the updated wiki.
> > >
> > > 11. If we believe the token bucket behavior is better in terms of
> > handling
> > > the burst behavior, we probably don't need a separate KIP since it's
> just
> > > an implementation detail.
> > >
> > > Regarding "So instead of having one sample equal to 560 in the last
> > window,
> > > we will have 100 samples equal to 5.6.", I was thinking that we will
> > > allocate 5 to each of the first 99 samples and 65 to the last sample.
> > Then,
> > > 6 new samples have to come before the balance becomes 0 again.
> > Intuitively,
> > > we are accumulating credits in each sample. If a usage comes in, we
> first
> > > use all existing credits to offset that. If we can't, the remaining
> usage
> > > will be recorded in the last sample, which will be offset by future
> > > credits. That seems to match the token bucket behavior the closest.
> > >
> > > 20. Could you provide some guidelines on the typical rate that an admin
> > > should set?
> > >
> > > Jun
> > >
> > > On Thu, Jun 4, 2020 at 8:22 AM David Jacot <[email protected]>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I just published an updated version of the KIP which includes:
> > > > * Using a slightly modified version of our Rate. I have tried to
> > > formalize
> > > > it based on our discussion. As Anna suggested, we may find a better
> way
> > > to
> > > > implement it.
> > > > * Handling of ValidateOnly as pointed out by Tom.
> > > >
> > > > Please, check it out and let me know what you think.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley <[email protected]>
> > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > As a user I might expect the validateOnly option to do everything
> > > except
> > > > > actually make the changes. That interpretation would imply the
> quota
> > > > should
> > > > > be checked, but the check should obviously be side-effect free. I
> > think
> > > > > this interpretation could be useful because it gives the caller
> > either
> > > > some
> > > > > confidence that they're not going to hit the quota, or tell them,
> via
> > > the
> > > > > exception, when they can expect the call to work. But for this to
> be
> > > > useful
> > > > > it would require the retry logic to not retry the request when
> > > > validateOnly
> > > > > was set.
> > > > >
> > > > > On the other hand, if validateOnly is really about validating only
> > some
> > > > > aspects of the request (which maybe is what the name implies), then
> > we
> > > > > should clarify in the Javadoc that the quota is not included in the
> > > > > validation.
> > > > >
> > > > > On balance, I agree with what you're proposing, since the extra
> > utility
> > > > of
> > > > > including the quota in the validation seems to be not worth the
> extra
> > > > > complication for the retry.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Tom
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jun 4, 2020 at 3:32 PM David Jacot <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi Tom,
> > > > > >
> > > > > > That's a good question. As the validation does not create any
> load
> > on
> > > > the
> > > > > > controller, I was planning to do it without checking the quota at
> > > all.
> > > > > Does
> > > > > > that
> > > > > > sound reasonable?
> > > > > >
> > > > > > Best,
> > > > > > David
> > > > > >
> > > > > > On Thu, Jun 4, 2020 at 4:23 PM David Jacot <[email protected]>
> > > > wrote:
> > > > > >
> > > > > > > Hi Jun and Anna,
> > > > > > >
> > > > > > > Thank you both for your replies.
> > > > > > >
> > > > > > > Based on our recent discussion, I agree that using a rate is
> > better
> > > > to
> > > > > > > remain
> > > > > > > consistent with other quotas. As you both suggested, it seems
> > that
> > > > > > changing
> > > > > > > the way we compute the rate to better handle spiky workloads
> and
> > > > > behave a
> > > > > > > bit more similarly to the token bucket algorithm makes sense
> for
> > > all
> > > > > > > quotas as
> > > > > > > well.
> > > > > > >
> > > > > > > I will update the KIP to reflect this.
> > > > > > >
> > > > > > > Anna, I think that we can explain this in this KIP. We can't
> just
> > > say
> > > > > > that
> > > > > > > the Rate
> > > > > > > will be updated in this KIP. I think that we need to give a bit
> > > more
> > > > > > info.
> > > > > > >
> > > > > > > Best,
> > > > > > > David
> > > > > > >
> > > > > > > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner <[email protected]
> >
> > > > wrote:
> > > > > > >
> > > > > > >> Hi Jun and David,
> > > > > > >>
> > > > > > >> Regarding token bucket vs, Rate behavior. We recently
> observed a
> > > > > couple
> > > > > > of
> > > > > > >> cases where a bursty workload behavior would result in
> long-ish
> > > > pauses
> > > > > > in
> > > > > > >> between, resulting in lower overall bandwidth than the quota.
> I
> > > will
> > > > > > need
> > > > > > >> to debug this a bit more to be 100% sure, but it does look
> like
> > > the
> > > > > case
> > > > > > >> described by David earlier in this thread. So, I agree with
> Jun
> > > -- I
> > > > > > think
> > > > > > >> we should make all quota rate behavior consistent, and
> probably
> > > > > similar
> > > > > > to
> > > > > > >> the token bucket one.
> > > > > > >>
> > > > > > >> Looking at KIP-13, it doesn't describe rate calculation in
> > enough
> > > > > > detail,
> > > > > > >> but does mention window size. So, we could keep "window size"
> > and
> > > > > > "number
> > > > > > >> of samples" configs and change Rate implementation to be more
> > > > similar
> > > > > to
> > > > > > >> token bucket:
> > > > > > >> * number of samples define our burst size
> > > > > > >> * Change the behavior, which could be described as: If a burst
> > > > happens
> > > > > > >> after an idle period, the burst would effectively spread
> evenly
> > > over
> > > > > the
> > > > > > >> (now - window) time period, where window is (<number of
> > samples> -
> > > > 1)*
> > > > > > >> <window size>. Which basically describes a token bucket, while
> > > > keeping
> > > > > > the
> > > > > > >> current quota configs. I think we can even implement this by
> > > > changing
> > > > > > the
> > > > > > >> way we record the last sample or lastWindowMs.
> > > > > > >>
> > > > > > >> Jun, if we would be changing Rate calculation behavior in
> > > bandwidth
> > > > > and
> > > > > > >> request quotas, would we need a separate KIP? Shouldn't need
> to
> > if
> > > > we
> > > > > > >> keep window size and number of samples configs, right?
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Anna
> > > > > > >>
> > > > > > >> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao <[email protected]>
> > wrote:
> > > > > > >>
> > > > > > >> > Hi, David,
> > > > > > >> >
> > > > > > >> > Thanks for the reply.
> > > > > > >> >
> > > > > > >> > 11. To match the behavior in the Token bucket approach, I
> was
> > > > > thinking
> > > > > > >> that
> > > > > > >> > requests that don't fit in the previous time windows will be
> > > > > > >> accumulated in
> > > > > > >> > the current time window. So, the 60 extra requests will be
> > > > > accumulated
> > > > > > >> in
> > > > > > >> > the latest window. Then, the client also has to wait for 12
> > more
> > > > > secs
> > > > > > >> > before throttling is removed. I agree that this is probably
> a
> > > > better
> > > > > > >> > behavior and it's reasonable to change the existing behavior
> > to
> > > > this
> > > > > > >> one.
> > > > > > >> >
> > > > > > >> > To me, it seems that sample_size * num_windows is the same
> as
> > > max
> > > > > > burst
> > > > > > >> > balance. The latter seems a bit better to configure. The
> thing
> > > is
> > > > > that
> > > > > > >> the
> > > > > > >> > existing quota system has already been used in quite a few
> > > places
> > > > > and
> > > > > > >> if we
> > > > > > >> > want to change the configuration in the future, there is the
> > > > > migration
> > > > > > >> > cost. Given that, do you feel it's better to adopt the  new
> > > token
> > > > > > bucket
> > > > > > >> > terminology or just adopt the behavior somehow into our
> > existing
> > > > > > >> system? If
> > > > > > >> > it's the former, it would be useful to document this in the
> > > > rejected
> > > > > > >> > section and add a future plan on migrating existing quota
> > > > > > >> configurations.
> > > > > > >> >
> > > > > > >> > Jun
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Tue, Jun 2, 2020 at 6:55 AM David Jacot <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > > Hi Jun,
> > > > > > >> > >
> > > > > > >> > > Thanks for your reply.
> > > > > > >> > >
> > > > > > >> > > 10. I think that both options are likely equivalent from
> an
> > > > > accuracy
> > > > > > >> > point
> > > > > > >> > > of
> > > > > > >> > > view. If we put the implementation aside, conceptually, I
> am
> > > not
> > > > > > >> > convinced
> > > > > > >> > > by the used based approach because resources don't have a
> > > clear
> > > > > > owner
> > > > > > >> > > in AK at the moment. A topic can be created by (Principal
> A,
> > > no
> > > > > > client
> > > > > > >> > id),
> > > > > > >> > > then can be extended by (no principal, Client B), and
> > finally
> > > > > > deleted
> > > > > > >> by
> > > > > > >> > > (Principal C, Client C). This does not sound right to me
> > and I
> > > > > fear
> > > > > > >> that
> > > > > > >> > it
> > > > > > >> > > is not going to be easy to grasp for our users.
> > > > > > >> > >
> > > > > > >> > > Regarding the naming, I do agree that we can make it more
> > > future
> > > > > > >> proof.
> > > > > > >> > > I propose `controller_mutations_rate`. I think that
> > > > > differentiating
> > > > > > >> the
> > > > > > >> > > mutations
> > > > > > >> > > from the reads is still a good thing for the future.
> > > > > > >> > >
> > > > > > >> > > 11. I am not convinced by your proposal for the following
> > > > reasons:
> > > > > > >> > >
> > > > > > >> > > First, in my toy example, I used 101 windows and 7 * 80
> > > > requests.
> > > > > We
> > > > > > >> > could
> > > > > > >> > > effectively allocate 5 * 100 requests to the previous
> > windows
> > > > > > assuming
> > > > > > >> > that
> > > > > > >> > > they are empty. What shall we do with the remaining 60
> > > requests?
> > > > > > >> Shall we
> > > > > > >> > > allocate them to the current window or shall we divide it
> > > among
> > > > > all
> > > > > > >> the
> > > > > > >> > > windows?
> > > > > > >> > >
> > > > > > >> > > Second, I don't think that we can safely change the
> behavior
> > > of
> > > > > all
> > > > > > >> the
> > > > > > >> > > existing
> > > > > > >> > > rates used because it actually changes the computation of
> > the
> > > > rate
> > > > > > as
> > > > > > >> > > values
> > > > > > >> > > allocated to past windows would expire before they would
> > > today.
> > > > > > >> > >
> > > > > > >> > > Overall, while trying to fit in the current rate, we are
> > going
> > > > to
> > > > > > >> build a
> > > > > > >> > > slightly
> > > > > > >> > > different version of the rate which will be even more
> > > confusing
> > > > > for
> > > > > > >> > users.
> > > > > > >> > >
> > > > > > >> > > Instead, I think that we should embrace the notion of
> burst
> > as
> > > > it
> > > > > > >> could
> > > > > > >> > > also
> > > > > > >> > > be applied to other quotas in the future. Users don't have
> > to
> > > > know
> > > > > > >> that
> > > > > > >> > we
> > > > > > >> > > use the Token Bucket or a special rate inside at the end
> of
> > > the
> > > > > day.
> > > > > > >> It
> > > > > > >> > is
> > > > > > >> > > an
> > > > > > >> > > implementation detail.
> > > > > > >> > >
> > > > > > >> > > Users would be able to define:
> > > > > > >> > > - a rate R; and
> > > > > > >> > > - a maximum burst B.
> > > > > > >> > >
> > > > > > >> > > If we change the metrics to be as follow:
> > > > > > >> > > - the actual rate;
> > > > > > >> > > - the burst balance in %, 0 meaning that the user is
> > > throttled;
> > > > > > >> > > It remains disattach from the algorithm.
> > > > > > >> > >
> > > > > > >> > > I personally prefer this over having to define a rate and
> a
> > > > number
> > > > > > of
> > > > > > >> > > windows
> > > > > > >> > > while having to understand that the number of windows
> > > implicitly
> > > > > > >> defines
> > > > > > >> > > the
> > > > > > >> > > allowed burst. I think that it is clearer and easier to
> > grasp.
> > > > > Don't
> > > > > > >> you?
> > > > > > >> > >
> > > > > > >> > > Best,
> > > > > > >> > > David
> > > > > > >> > >
> > > > > > >> > > On Fri, May 29, 2020 at 6:38 PM Jun Rao <[email protected]
> >
> > > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi, David, Anna,
> > > > > > >> > > >
> > > > > > >> > > > Thanks for the response. Sorry for the late reply.
> > > > > > >> > > >
> > > > > > >> > > > 10. Regarding exposing rate or usage as quota. Your
> > argument
> > > > is
> > > > > > that
> > > > > > >> > > usage
> > > > > > >> > > > is not very accurate anyway and is harder to implement.
> > So,
> > > > > let's
> > > > > > >> just
> > > > > > >> > > be a
> > > > > > >> > > > bit loose and expose rate. I am sort of neutral on that.
> > (1)
> > > > It
> > > > > > >> seems
> > > > > > >> > to
> > > > > > >> > > me
> > > > > > >> > > > that overall usage will be relatively more accurate than
> > > rate.
> > > > > All
> > > > > > >> the
> > > > > > >> > > > issues that Anna brought up seem to exist for rate too.
> > Rate
> > > > has
> > > > > > the
> > > > > > >> > > > additional problem that the cost of each request may not
> > be
> > > > > > uniform.
> > > > > > >> > (2)
> > > > > > >> > > In
> > > > > > >> > > > terms of implementation, a usage based approach requires
> > > > > tracking
> > > > > > >> the
> > > > > > >> > > user
> > > > > > >> > > > info through the life cycle of an operation. However, as
> > you
> > > > > > >> mentioned,
> > > > > > >> > > > things like topic creation can generate additional
> > > > > > >> > > > LeaderAndIsr/UpdateMetadata requests. Longer term, we
> > > probably
> > > > > > want
> > > > > > >> to
> > > > > > >> > > > associate those cost to the user who initiated the
> > > operation.
> > > > If
> > > > > > we
> > > > > > >> do
> > > > > > >> > > > that, we sort of need to track the user for the full
> life
> > > > cycle
> > > > > of
> > > > > > >> the
> > > > > > >> > > > processing of an operation anyway. (3) If you prefer
> rate
> > > > > > strongly,
> > > > > > >> I
> > > > > > >> > > don't
> > > > > > >> > > > have strong objections. However, I do feel that the new
> > > quota
> > > > > name
> > > > > > >> > should
> > > > > > >> > > > be able to cover all controller related cost longer
> term.
> > > This
> > > > > KIP
> > > > > > >> > > > currently only covers topic creation/deletion. It would
> > not
> > > be
> > > > > > >> ideal if
> > > > > > >> > > in
> > > > > > >> > > > the future, we have to add yet another type of quota for
> > > some
> > > > > > other
> > > > > > >> > > > controller related operations. The quota name in the KIP
> > has
> > > > > > >> partition
> > > > > > >> > > > mutation. In the future, if we allow things like topic
> > > > renaming,
> > > > > > it
> > > > > > >> may
> > > > > > >> > > not
> > > > > > >> > > > be related to partition mutation directly and it would
> be
> > > > > trickier
> > > > > > >> to
> > > > > > >> > fit
> > > > > > >> > > > those operations in the current quota. So, maybe sth
> more
> > > > > general
> > > > > > >> like
> > > > > > >> > > > controller_operations_quota will be more future proof.
> > > > > > >> > > >
> > > > > > >> > > > 11. Regarding the difference between the token bucket
> > > > algorithm
> > > > > > and
> > > > > > >> our
> > > > > > >> > > > current quota mechanism. That's a good observation. It
> > seems
> > > > > that
> > > > > > we
> > > > > > >> > can
> > > > > > >> > > > make a slight change to our current quota mechanism to
> > > > achieve a
> > > > > > >> > similar
> > > > > > >> > > > thing. As you said, the issue was that we allocate all
> 7 *
> > > 80
> > > > > > >> requests
> > > > > > >> > in
> > > > > > >> > > > the last 1 sec window in our current mechanism. This is
> a
> > > bit
> > > > > > >> > unintuitive
> > > > > > >> > > > since each sec only has a quota capacity of 5. An
> > > alternative
> > > > > way
> > > > > > >> is to
> > > > > > >> > > > allocate the 7 * 80 requests to all previous windows,
> each
> > > up
> > > > to
> > > > > > the
> > > > > > >> > > > remaining quota capacity left. So, you will fill the
> > > current 1
> > > > > sec
> > > > > > >> > window
> > > > > > >> > > > and all previous ones, each with 5. Then, it seems this
> > will
> > > > > give
> > > > > > >> the
> > > > > > >> > > same
> > > > > > >> > > > behavior as token bucket? The reason that I keep asking
> > this
> > > > is
> > > > > > that
> > > > > > >> > from
> > > > > > >> > > > an operational perspective, it's simpler if all types of
> > > > quotas
> > > > > > >> work in
> > > > > > >> > > the
> > > > > > >> > > > same way.
> > > > > > >> > > >
> > > > > > >> > > > Jun
> > > > > > >> > > >
> > > > > > >> > > > On Tue, May 26, 2020 at 9:52 AM David Jacot <
> > > > > [email protected]>
> > > > > > >> > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi folks,
> > > > > > >> > > > >
> > > > > > >> > > > > I have updated the KIP. As mentioned by Jun, I have
> made
> > > the
> > > > > > >> > > > > quota per principal/clientid similarly to the other
> > > quotas.
> > > > I
> > > > > > have
> > > > > > >> > > > > also explained how this will work in conjunction with
> > the
> > > > auto
> > > > > > >> > > > > topics creation.
> > > > > > >> > > > >
> > > > > > >> > > > > Please, take a look at it. I plan to call a vote for
> it
> > in
> > > > the
> > > > > > >> next
> > > > > > >> > few
> > > > > > >> > > > > days if there are no comments in this thread.
> > > > > > >> > > > >
> > > > > > >> > > > > Best,
> > > > > > >> > > > > David
> > > > > > >> > > > >
> > > > > > >> > > > > On Wed, May 13, 2020 at 10:57 AM Tom Bentley <
> > > > > > [email protected]
> > > > > > >> >
> > > > > > >> > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Hi David,
> > > > > > >> > > > > >
> > > > > > >> > > > > > Thanks for the explanation and confirmation that
> > > evolving
> > > > > the
> > > > > > >> APIs
> > > > > > >> > is
> > > > > > >> > > > not
> > > > > > >> > > > > > off the table in the longer term.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Kind regards,
> > > > > > >> > > > > >
> > > > > > >> > > > > > Tom
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

Reply via email to