gharris1727 commented on PR #12806: URL: https://github.com/apache/kafka/pull/12806#issuecomment-1668440375
Hey @divijvaidya . > aren't we then accepting the current behaviour of quotas implementation as expected behaviour i.e. the deviation could be +- epsilon and epsilon could exceed the thresholds (max.connection.creation.rate) set for a windowing period. It is not possible to eliminate the deviation visible to the outside observer, and these tests will always need to include an error term. Here's what the error bounds will be for the default window configuration and various observation times with the variable-width and fixed-width windowing algorithms: ``` millis variable-width fixed-width 11001 2.4444444444444446 2.2 20001 1.736842105263158 2.2 22001 1.736842105263158 1.5714285714285714 30001 1.5172413793103448 1.5714285714285714 33001 1.5172413793103448 1.375 40001 1.4102564102564104 1.375 44001 1.4102564102564104 1.2790697674418605 50001 1.346938775510204 1.2790697674418605 55001 1.346938775510204 1.2222222222222223 60001 1.305084745762712 1.2222222222222223 66001 1.305084745762712 1.1846153846153846 70001 1.2753623188405796 1.1846153846153846 77001 1.2753623188405796 1.1578947368421053 80001 1.2531645569620253 1.1578947368421053 88001 1.2531645569620253 1.1379310344827587 90001 1.2359550561797752 1.1379310344827587 99001 1.2359550561797752 1.1224489795918366 ``` This PR implements the variable-width limits because the variable-width algorithm is currently on trunk, not because it is more accepted or correct. I'm personally ambivalent about which algorithm should be used. This PR replaces the un-motivated and incorrect hardcoded constants with computed bounds, and that is beneficial with or without the fixed-width algorithm. > Alternatively, the fix i.e. https://github.com/apache/kafka/pull/12045 doesn't require a KIP (it doesn't change any public interfaces). Please feel free to pick up that PR (or duplicate it). I won't have time any time soon to work on it. I will be happy to provide a review. If you don't have time or interest to work that PR and/or KIP, then I think a test-only fix is appropriate until someone is interested in picking up the PR. The flaky tests have already made us aware of the odd behavior of the variable-width algorithm, so more ongoing failures aren't helpful. > Having said that, the answer might be that the current behaviour is accepted behaviour. In which case, I would be comfortable with this change if it is accompanied by a change in docs explaining the current expectations from the windowing algorithm so that the users at least know what their expectation from the quota implementation should be. I don't think that what this PR addresses needs to be explained in the documentation. * The existing quotas documentation is very general, and focuses on the strategic usage of quotas. This extremely specific error bound would be out of place among the other documentation. * How users choose their rate limit depends on so many more important factors than this error term (such as network environment, hardware, etc) that I think to mention the error factor in specific would mislead users to think it is more important than it is. * Users will already be compensating for this error term when setting their limits. If someone observes that the effective rate limit (either for bursts or steady-state flows) is too high and affects their cluster health, they will lower their configured rate limit to compensate. They may be blaming the wrong thing (their hardware vs the rate limit algorithm) but the effect is still the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org