[ https://issues.apache.org/jira/browse/KAFKA-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537667#comment-16537667 ]
ASF GitHub Bot commented on KAFKA-7098: --------------------------------------- hzxa21 opened a new pull request #5350: KAFKA-7098: Improve accuracy of throttling by avoiding under-estimating actual rate in Throttler URL: https://github.com/apache/kafka/pull/5350 This PR modifies Throttler.scala by setting the `periodStartNs` to the current time instead of the time before the potential `sleep` call when throttling is needed. The reason behind is that if we reset `periodStartNs` to the time before `sleep`, we will increase the time window in the next actual rate calculation, which will underestimate the actual rate and may miss the throttling opportunity or sleep for less time. A unit test is also added to test the fix. For example, if we use Throttler to throttle the pre sec rate to 10 with checkInterval 1s, in the original implementation: 1. 15 events happen during [t0, t0+1s] 2. Throttler will sleep the thread until t0+1.5s, then reset period start time to t0+1s 3. 10 events happen during [t0+1.5s, t0+2s], Throttler will not throttle this time because the estimated rate is `10 / [(t0+2s) - (t0+1s)] = 10` But the actual rate during [t0, t0+2s] is `(10+15) / 2 = 12.5 > 10` ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve accuracy of the log cleaner throttle rate > ------------------------------------------------- > > Key: KAFKA-7098 > URL: https://issues.apache.org/jira/browse/KAFKA-7098 > Project: Kafka > Issue Type: Improvement > Reporter: Dong Lin > Assignee: Dong Lin > Priority: Major > > LogCleaner uses the Throttler class to throttler the log cleaning rate to the > user-specified limit, i.e. log.cleaner.io.max.bytes.per.second. However, in > Throttler.maybeThrottle(), the periodStartNs is set to the time before the > sleep after the sleep() is called, which artificially increase the actual > window size and under-estimate the actual log cleaning rate. This causes the > log cleaning IO to be higher than the user-specified limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)