Hi Anna and Jun,

You are right. We should allocate up to the quota for each old sample.

I have revamped the Throttling Algorithm section to better explain our
thought process and the token bucket inspiration.

I have also added a chapter with few guidelines about how to define
the quota. There is no magic formula for this but I give few insights.
I don't have specific numbers that can be used out of the box so I
think that it is better to not put any for the time being. We can always
complement later on in the documentation.

Please, take a look and let me know what you think.

Cheers,
David

On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner <a...@confluent.io> wrote:

> Hi David and Jun,
>
> I dug a bit deeper into the Rate implementation, and wanted to confirm that
> I do believe that the token bucket behavior is better for the reasons we
> already discussed but wanted to summarize. The main difference between Rate
> and token bucket is that the Rate implementation allows a burst by
> borrowing from the future, whereas a token bucket allows a burst by using
> accumulated tokens from the previous idle period. Using accumulated tokens
> smoothes out the rate measurement in general. Configuring a large burst
> requires configuring a large quota window, which causes long delays for
> bursty workload, due to borrowing credits from the future. Perhaps it is
> useful to add a summary in the beginning of the Throttling Algorithm
> section?
>
> In my previous email, I mentioned the issue we observed with the bandwidth
> quota, where a low quota (1MB/s per broker) was limiting bandwidth visibly
> below the quota. I thought it was strictly the issue with the Rate
> implementation as well, but I found a root cause to be different but
> amplified by the Rate implementation (long throttle delays of requests in a
> burst). I will describe it here for completeness using the following
> example:
>
>    -
>
>    Quota = 1MB/s, default window size and number of samples
>    -
>
>    Suppose there are 6 connections (maximum 6 outstanding requests), and
>    each produce request is 5MB. If all requests arrive in a burst, the
> last 4
>    requests (20MB over 10MB allowed in a window) may get the same throttle
>    time if they are processed concurrently. We record the rate under the
> lock,
>    but then calculate throttle time separately after that. So, for each
>    request, the observed rate could be 3MB/s, and each request gets
> throttle
>    delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The delay is
>    longer than the total rate window, which results in lower bandwidth than
>    the quota. Since all requests got the same delay, they will also arrive
> in
>    a burst, which may also result in longer delay than necessary. It looks
>    pretty easy to fix, so I will open a separate JIRA for it. This can be
>    additionally mitigated by token bucket behavior.
>
>
> For the algorithm "So instead of having one sample equal to 560 in the last
> window, we will have 100 samples equal to 5.6.", I agree with Jun. I would
> allocate 5 per each old sample that is still in the overall window. It
> would be a bit larger granularity than the pure token bucket (we lose 5
> units / mutation once we move past the sample window), but it is better
> than the long delay.
>
> Thanks,
>
> Anna
>
>
> On Thu, Jun 4, 2020 at 6:33 PM Jun Rao <j...@confluent.io> wrote:
>
> > Hi, David, Anna,
> >
> > Thanks for the discussion and the updated wiki.
> >
> > 11. If we believe the token bucket behavior is better in terms of
> handling
> > the burst behavior, we probably don't need a separate KIP since it's just
> > an implementation detail.
> >
> > Regarding "So instead of having one sample equal to 560 in the last
> window,
> > we will have 100 samples equal to 5.6.", I was thinking that we will
> > allocate 5 to each of the first 99 samples and 65 to the last sample.
> Then,
> > 6 new samples have to come before the balance becomes 0 again.
> Intuitively,
> > we are accumulating credits in each sample. If a usage comes in, we first
> > use all existing credits to offset that. If we can't, the remaining usage
> > will be recorded in the last sample, which will be offset by future
> > credits. That seems to match the token bucket behavior the closest.
> >
> > 20. Could you provide some guidelines on the typical rate that an admin
> > should set?
> >
> > Jun
> >
> > On Thu, Jun 4, 2020 at 8:22 AM David Jacot <dja...@confluent.io> wrote:
> >
> > > Hi all,
> > >
> > > I just published an updated version of the KIP which includes:
> > > * Using a slightly modified version of our Rate. I have tried to
> > formalize
> > > it based on our discussion. As Anna suggested, we may find a better way
> > to
> > > implement it.
> > > * Handling of ValidateOnly as pointed out by Tom.
> > >
> > > Please, check it out and let me know what you think.
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley <tbent...@redhat.com>
> wrote:
> > >
> > > > Hi David,
> > > >
> > > > As a user I might expect the validateOnly option to do everything
> > except
> > > > actually make the changes. That interpretation would imply the quota
> > > should
> > > > be checked, but the check should obviously be side-effect free. I
> think
> > > > this interpretation could be useful because it gives the caller
> either
> > > some
> > > > confidence that they're not going to hit the quota, or tell them, via
> > the
> > > > exception, when they can expect the call to work. But for this to be
> > > useful
> > > > it would require the retry logic to not retry the request when
> > > validateOnly
> > > > was set.
> > > >
> > > > On the other hand, if validateOnly is really about validating only
> some
> > > > aspects of the request (which maybe is what the name implies), then
> we
> > > > should clarify in the Javadoc that the quota is not included in the
> > > > validation.
> > > >
> > > > On balance, I agree with what you're proposing, since the extra
> utility
> > > of
> > > > including the quota in the validation seems to be not worth the extra
> > > > complication for the retry.
> > > >
> > > > Thanks,
> > > >
> > > > Tom
> > > >
> > > >
> > > >
> > > > On Thu, Jun 4, 2020 at 3:32 PM David Jacot <dja...@confluent.io>
> > wrote:
> > > >
> > > > > Hi Tom,
> > > > >
> > > > > That's a good question. As the validation does not create any load
> on
> > > the
> > > > > controller, I was planning to do it without checking the quota at
> > all.
> > > > Does
> > > > > that
> > > > > sound reasonable?
> > > > >
> > > > > Best,
> > > > > David
> > > > >
> > > > > On Thu, Jun 4, 2020 at 4:23 PM David Jacot <dja...@confluent.io>
> > > wrote:
> > > > >
> > > > > > Hi Jun and Anna,
> > > > > >
> > > > > > Thank you both for your replies.
> > > > > >
> > > > > > Based on our recent discussion, I agree that using a rate is
> better
> > > to
> > > > > > remain
> > > > > > consistent with other quotas. As you both suggested, it seems
> that
> > > > > changing
> > > > > > the way we compute the rate to better handle spiky workloads and
> > > > behave a
> > > > > > bit more similarly to the token bucket algorithm makes sense for
> > all
> > > > > > quotas as
> > > > > > well.
> > > > > >
> > > > > > I will update the KIP to reflect this.
> > > > > >
> > > > > > Anna, I think that we can explain this in this KIP. We can't just
> > say
> > > > > that
> > > > > > the Rate
> > > > > > will be updated in this KIP. I think that we need to give a bit
> > more
> > > > > info.
> > > > > >
> > > > > > Best,
> > > > > > David
> > > > > >
> > > > > > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner <a...@confluent.io>
> > > wrote:
> > > > > >
> > > > > >> Hi Jun and David,
> > > > > >>
> > > > > >> Regarding token bucket vs, Rate behavior. We recently observed a
> > > > couple
> > > > > of
> > > > > >> cases where a bursty workload behavior would result in long-ish
> > > pauses
> > > > > in
> > > > > >> between, resulting in lower overall bandwidth than the quota. I
> > will
> > > > > need
> > > > > >> to debug this a bit more to be 100% sure, but it does look like
> > the
> > > > case
> > > > > >> described by David earlier in this thread. So, I agree with Jun
> > -- I
> > > > > think
> > > > > >> we should make all quota rate behavior consistent, and probably
> > > > similar
> > > > > to
> > > > > >> the token bucket one.
> > > > > >>
> > > > > >> Looking at KIP-13, it doesn't describe rate calculation in
> enough
> > > > > detail,
> > > > > >> but does mention window size. So, we could keep "window size"
> and
> > > > > "number
> > > > > >> of samples" configs and change Rate implementation to be more
> > > similar
> > > > to
> > > > > >> token bucket:
> > > > > >> * number of samples define our burst size
> > > > > >> * Change the behavior, which could be described as: If a burst
> > > happens
> > > > > >> after an idle period, the burst would effectively spread evenly
> > over
> > > > the
> > > > > >> (now - window) time period, where window is (<number of
> samples> -
> > > 1)*
> > > > > >> <window size>. Which basically describes a token bucket, while
> > > keeping
> > > > > the
> > > > > >> current quota configs. I think we can even implement this by
> > > changing
> > > > > the
> > > > > >> way we record the last sample or lastWindowMs.
> > > > > >>
> > > > > >> Jun, if we would be changing Rate calculation behavior in
> > bandwidth
> > > > and
> > > > > >> request quotas, would we need a separate KIP? Shouldn't need to
> if
> > > we
> > > > > >> keep window size and number of samples configs, right?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Anna
> > > > > >>
> > > > > >> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao <j...@confluent.io>
> wrote:
> > > > > >>
> > > > > >> > Hi, David,
> > > > > >> >
> > > > > >> > Thanks for the reply.
> > > > > >> >
> > > > > >> > 11. To match the behavior in the Token bucket approach, I was
> > > > thinking
> > > > > >> that
> > > > > >> > requests that don't fit in the previous time windows will be
> > > > > >> accumulated in
> > > > > >> > the current time window. So, the 60 extra requests will be
> > > > accumulated
> > > > > >> in
> > > > > >> > the latest window. Then, the client also has to wait for 12
> more
> > > > secs
> > > > > >> > before throttling is removed. I agree that this is probably a
> > > better
> > > > > >> > behavior and it's reasonable to change the existing behavior
> to
> > > this
> > > > > >> one.
> > > > > >> >
> > > > > >> > To me, it seems that sample_size * num_windows is the same as
> > max
> > > > > burst
> > > > > >> > balance. The latter seems a bit better to configure. The thing
> > is
> > > > that
> > > > > >> the
> > > > > >> > existing quota system has already been used in quite a few
> > places
> > > > and
> > > > > >> if we
> > > > > >> > want to change the configuration in the future, there is the
> > > > migration
> > > > > >> > cost. Given that, do you feel it's better to adopt the  new
> > token
> > > > > bucket
> > > > > >> > terminology or just adopt the behavior somehow into our
> existing
> > > > > >> system? If
> > > > > >> > it's the former, it would be useful to document this in the
> > > rejected
> > > > > >> > section and add a future plan on migrating existing quota
> > > > > >> configurations.
> > > > > >> >
> > > > > >> > Jun
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Jun 2, 2020 at 6:55 AM David Jacot <
> dja...@confluent.io
> > >
> > > > > wrote:
> > > > > >> >
> > > > > >> > > Hi Jun,
> > > > > >> > >
> > > > > >> > > Thanks for your reply.
> > > > > >> > >
> > > > > >> > > 10. I think that both options are likely equivalent from an
> > > > accuracy
> > > > > >> > point
> > > > > >> > > of
> > > > > >> > > view. If we put the implementation aside, conceptually, I am
> > not
> > > > > >> > convinced
> > > > > >> > > by the used based approach because resources don't have a
> > clear
> > > > > owner
> > > > > >> > > in AK at the moment. A topic can be created by (Principal A,
> > no
> > > > > client
> > > > > >> > id),
> > > > > >> > > then can be extended by (no principal, Client B), and
> finally
> > > > > deleted
> > > > > >> by
> > > > > >> > > (Principal C, Client C). This does not sound right to me
> and I
> > > > fear
> > > > > >> that
> > > > > >> > it
> > > > > >> > > is not going to be easy to grasp for our users.
> > > > > >> > >
> > > > > >> > > Regarding the naming, I do agree that we can make it more
> > future
> > > > > >> proof.
> > > > > >> > > I propose `controller_mutations_rate`. I think that
> > > > differentiating
> > > > > >> the
> > > > > >> > > mutations
> > > > > >> > > from the reads is still a good thing for the future.
> > > > > >> > >
> > > > > >> > > 11. I am not convinced by your proposal for the following
> > > reasons:
> > > > > >> > >
> > > > > >> > > First, in my toy example, I used 101 windows and 7 * 80
> > > requests.
> > > > We
> > > > > >> > could
> > > > > >> > > effectively allocate 5 * 100 requests to the previous
> windows
> > > > > assuming
> > > > > >> > that
> > > > > >> > > they are empty. What shall we do with the remaining 60
> > requests?
> > > > > >> Shall we
> > > > > >> > > allocate them to the current window or shall we divide it
> > among
> > > > all
> > > > > >> the
> > > > > >> > > windows?
> > > > > >> > >
> > > > > >> > > Second, I don't think that we can safely change the behavior
> > of
> > > > all
> > > > > >> the
> > > > > >> > > existing
> > > > > >> > > rates used because it actually changes the computation of
> the
> > > rate
> > > > > as
> > > > > >> > > values
> > > > > >> > > allocated to past windows would expire before they would
> > today.
> > > > > >> > >
> > > > > >> > > Overall, while trying to fit in the current rate, we are
> going
> > > to
> > > > > >> build a
> > > > > >> > > slightly
> > > > > >> > > different version of the rate which will be even more
> > confusing
> > > > for
> > > > > >> > users.
> > > > > >> > >
> > > > > >> > > Instead, I think that we should embrace the notion of burst
> as
> > > it
> > > > > >> could
> > > > > >> > > also
> > > > > >> > > be applied to other quotas in the future. Users don't have
> to
> > > know
> > > > > >> that
> > > > > >> > we
> > > > > >> > > use the Token Bucket or a special rate inside at the end of
> > the
> > > > day.
> > > > > >> It
> > > > > >> > is
> > > > > >> > > an
> > > > > >> > > implementation detail.
> > > > > >> > >
> > > > > >> > > Users would be able to define:
> > > > > >> > > - a rate R; and
> > > > > >> > > - a maximum burst B.
> > > > > >> > >
> > > > > >> > > If we change the metrics to be as follow:
> > > > > >> > > - the actual rate;
> > > > > >> > > - the burst balance in %, 0 meaning that the user is
> > throttled;
> > > > > >> > > It remains disattach from the algorithm.
> > > > > >> > >
> > > > > >> > > I personally prefer this over having to define a rate and a
> > > number
> > > > > of
> > > > > >> > > windows
> > > > > >> > > while having to understand that the number of windows
> > implicitly
> > > > > >> defines
> > > > > >> > > the
> > > > > >> > > allowed burst. I think that it is clearer and easier to
> grasp.
> > > > Don't
> > > > > >> you?
> > > > > >> > >
> > > > > >> > > Best,
> > > > > >> > > David
> > > > > >> > >
> > > > > >> > > On Fri, May 29, 2020 at 6:38 PM Jun Rao <j...@confluent.io>
> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi, David, Anna,
> > > > > >> > > >
> > > > > >> > > > Thanks for the response. Sorry for the late reply.
> > > > > >> > > >
> > > > > >> > > > 10. Regarding exposing rate or usage as quota. Your
> argument
> > > is
> > > > > that
> > > > > >> > > usage
> > > > > >> > > > is not very accurate anyway and is harder to implement.
> So,
> > > > let's
> > > > > >> just
> > > > > >> > > be a
> > > > > >> > > > bit loose and expose rate. I am sort of neutral on that.
> (1)
> > > It
> > > > > >> seems
> > > > > >> > to
> > > > > >> > > me
> > > > > >> > > > that overall usage will be relatively more accurate than
> > rate.
> > > > All
> > > > > >> the
> > > > > >> > > > issues that Anna brought up seem to exist for rate too.
> Rate
> > > has
> > > > > the
> > > > > >> > > > additional problem that the cost of each request may not
> be
> > > > > uniform.
> > > > > >> > (2)
> > > > > >> > > In
> > > > > >> > > > terms of implementation, a usage based approach requires
> > > > tracking
> > > > > >> the
> > > > > >> > > user
> > > > > >> > > > info through the life cycle of an operation. However, as
> you
> > > > > >> mentioned,
> > > > > >> > > > things like topic creation can generate additional
> > > > > >> > > > LeaderAndIsr/UpdateMetadata requests. Longer term, we
> > probably
> > > > > want
> > > > > >> to
> > > > > >> > > > associate those cost to the user who initiated the
> > operation.
> > > If
> > > > > we
> > > > > >> do
> > > > > >> > > > that, we sort of need to track the user for the full life
> > > cycle
> > > > of
> > > > > >> the
> > > > > >> > > > processing of an operation anyway. (3) If you prefer rate
> > > > > strongly,
> > > > > >> I
> > > > > >> > > don't
> > > > > >> > > > have strong objections. However, I do feel that the new
> > quota
> > > > name
> > > > > >> > should
> > > > > >> > > > be able to cover all controller related cost longer term.
> > This
> > > > KIP
> > > > > >> > > > currently only covers topic creation/deletion. It would
> not
> > be
> > > > > >> ideal if
> > > > > >> > > in
> > > > > >> > > > the future, we have to add yet another type of quota for
> > some
> > > > > other
> > > > > >> > > > controller related operations. The quota name in the KIP
> has
> > > > > >> partition
> > > > > >> > > > mutation. In the future, if we allow things like topic
> > > renaming,
> > > > > it
> > > > > >> may
> > > > > >> > > not
> > > > > >> > > > be related to partition mutation directly and it would be
> > > > trickier
> > > > > >> to
> > > > > >> > fit
> > > > > >> > > > those operations in the current quota. So, maybe sth more
> > > > general
> > > > > >> like
> > > > > >> > > > controller_operations_quota will be more future proof.
> > > > > >> > > >
> > > > > >> > > > 11. Regarding the difference between the token bucket
> > > algorithm
> > > > > and
> > > > > >> our
> > > > > >> > > > current quota mechanism. That's a good observation. It
> seems
> > > > that
> > > > > we
> > > > > >> > can
> > > > > >> > > > make a slight change to our current quota mechanism to
> > > achieve a
> > > > > >> > similar
> > > > > >> > > > thing. As you said, the issue was that we allocate all 7 *
> > 80
> > > > > >> requests
> > > > > >> > in
> > > > > >> > > > the last 1 sec window in our current mechanism. This is a
> > bit
> > > > > >> > unintuitive
> > > > > >> > > > since each sec only has a quota capacity of 5. An
> > alternative
> > > > way
> > > > > >> is to
> > > > > >> > > > allocate the 7 * 80 requests to all previous windows, each
> > up
> > > to
> > > > > the
> > > > > >> > > > remaining quota capacity left. So, you will fill the
> > current 1
> > > > sec
> > > > > >> > window
> > > > > >> > > > and all previous ones, each with 5. Then, it seems this
> will
> > > > give
> > > > > >> the
> > > > > >> > > same
> > > > > >> > > > behavior as token bucket? The reason that I keep asking
> this
> > > is
> > > > > that
> > > > > >> > from
> > > > > >> > > > an operational perspective, it's simpler if all types of
> > > quotas
> > > > > >> work in
> > > > > >> > > the
> > > > > >> > > > same way.
> > > > > >> > > >
> > > > > >> > > > Jun
> > > > > >> > > >
> > > > > >> > > > On Tue, May 26, 2020 at 9:52 AM David Jacot <
> > > > dja...@confluent.io>
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi folks,
> > > > > >> > > > >
> > > > > >> > > > > I have updated the KIP. As mentioned by Jun, I have made
> > the
> > > > > >> > > > > quota per principal/clientid similarly to the other
> > quotas.
> > > I
> > > > > have
> > > > > >> > > > > also explained how this will work in conjunction with
> the
> > > auto
> > > > > >> > > > > topics creation.
> > > > > >> > > > >
> > > > > >> > > > > Please, take a look at it. I plan to call a vote for it
> in
> > > the
> > > > > >> next
> > > > > >> > few
> > > > > >> > > > > days if there are no comments in this thread.
> > > > > >> > > > >
> > > > > >> > > > > Best,
> > > > > >> > > > > David
> > > > > >> > > > >
> > > > > >> > > > > On Wed, May 13, 2020 at 10:57 AM Tom Bentley <
> > > > > tbent...@redhat.com
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi David,
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks for the explanation and confirmation that
> > evolving
> > > > the
> > > > > >> APIs
> > > > > >> > is
> > > > > >> > > > not
> > > > > >> > > > > > off the table in the longer term.
> > > > > >> > > > > >
> > > > > >> > > > > > Kind regards,
> > > > > >> > > > > >
> > > > > >> > > > > > Tom
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to