dajac commented on a change in pull request #8977:
URL: https://github.com/apache/kafka/pull/8977#discussion_r452162828



##########
File path: 
clients/src/main/java/org/apache/kafka/common/metrics/stats/TokenBucket.java
##########
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kafka.common.metrics.stats;
+
+import java.util.concurrent.TimeUnit;
+import org.apache.kafka.common.metrics.MetricConfig;
+
+import java.util.List;
+
+/**
+ * A {@link SampledStat} that mimics the behavior of a Token Bucket and is 
meant to
+ * be used in conjunction with a {@link Rate} and a {@link 
org.apache.kafka.common.metrics.Quota}.
+ *
+ * The {@link TokenBucket} considers each sample as the amount of credits 
spent during the sample's
+ * window while giving back credits based on the defined quota.
+ *
+ * At time T, it computes the total O as follow:
+ * - O(T) = max(0, O(T-1) - Q * (W(T) - W(T-1)) + S(T)
+ * Where:
+ * - Q is the defined Quota or 0 if undefined
+ * - W is the time of the sample or now if undefined
+ * - S is the value of the sample or 0 if undefined
+ *
+ * Example with 3 samples with a Quota = 2:
+ * - S1 at T+0s => 4
+ * - S2 at T+2s => 2
+ * - S3 at T+4s => 6
+ *
+ * The total at T+6s is computed as follow:
+ * - T0 => Total at T+0s => S1 = 4
+ * - T1 => Total at T+2s => max(0, T0 - Q * dT) + S2 = (4 - 2 * 2) + 2 = 2
+ * - T2 => Total at T+4s => max(0, T1 - Q * dT) + S3 = (2 - 2 * 2) + 6 = 6
+ * - T3 => Total at T+6s => max(0, T2 - Q * dT) = (6 - 2 * 2) = 2
+ */
+public class TokenBucket extends SampledStat {
+
+    private final TimeUnit unit;
+
+    public TokenBucket() {
+        this(TimeUnit.SECONDS);
+    }
+
+    public TokenBucket(TimeUnit unit) {
+        super(0);
+        this.unit = unit;
+    }
+
+    @Override
+    protected void update(Sample sample, MetricConfig config, double value, 
long now) {
+        sample.value += value;
+    }
+
+    @Override

Review comment:
       My first implementation was doing what you described: backfilling past 
samples up to quota * sample length and putting the remaining in the last (or 
current) sample. You seems to use sample in singular quite often in your 
description so I wonder if you also meant to backfill all the past samples. Did 
you?
   
   It turned out that backfilling past samples is not as strait-forward as it 
seems because, as you pointed out in your other comment, we may have holes or 
no past samples at all. Let me take few examples to illustrate.
   
   Let's assume that we have the following settings:
   - samples = 6
   - window = 1s
   - quota = 5
   
   Example 1:
   Let's assume that we record 2 at T and 30 at T+5. When 2 is recorded, a 
first sample at T is created with the value 2. Then, we record 30 at T+5. As it 
is above the quota of the current sample, we want to backfill T, T+1, T+2, T+3, 
T+4 and put the remaining in T+5. This mean, that we have to create samples for 
T+1, T+2, T+3 and T+4 because they don't exist in memory. We would end up with 
the following samples in memory:
   T = 5, T+1 = 5, T+2 = 5, T+3 = 5, T+4 = 5, T+5 = 7.
   
   Example 2:
   Let's assume that we record 2 at T, 3 at T+3 and 30 at T+5. The mechanics is 
similar to the previous example in the difference that it requires to add new 
samples between existing past samples: T+1 and T+2; as well as adding sample 
T+4. We would end up with the following samples in memory:
   T = 5, T+1 = 5, T+2 = 5, T+3 = 5, T+4 = 5, T+5 = 11.
   
   With the current base implementation of the SampleStat, adding all the past 
samples, especially in between existing past samples is quite complex and 
possibly error-prone. The simplest would be to always allocate all the samples 
but this is something I would like to avoid.
   
   Following this observation, I started to look at the problem from a 
different angle. Instead of spreading value at record time, I thought that we 
could adjust them while combining the samples. The idea is to run a token 
bucket algorithm in the combine method based on all the non-expired samples.
   
   My implementation is basically a reversed Token Bucket, a token bucket which 
start at zero instead of stating at the maximum number of tokens.
   
   Let's define the following:
   * TK = The current number of tokens in the bucket
   * B = The maximum burst (number of samples * sample length in our case)
   * R = The rate at which we decrease tokens (quota * sample length in our 
case)
   * T = The last time TK got updated
   
   At time now, we update the number of tokens in the bucket with:
   * TK = max(TK - (now - T) * R), 0)
   
   A request with a burst worth of K is admitted if TK < B, and update TK = TK 
+ K.
   
   My implementation basically runs this, sample by sample in the chronological 
order, in the combine method modulo the admission verification that is done in 
the Rate.
   
   It seems to me that this is close to our original token bucket based 
proposal and also pretty close to our backfilling old samples idea.
   
   Does this help to clarify my proposal? WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to