Bryan Beaudreault created HBASE-27704:
-----------------------------------------
Summary: Quotas can drastically overflow configured limit
Key: HBASE-27704
URL: https://issues.apache.org/jira/browse/HBASE-27704
Project: HBase
Issue Type: Bug
Reporter: Bryan Beaudreault
Attachments: Screenshot 2023-03-10 at 5.17.51 PM.png
The original implementation did not allow exceeding quota. For example, you
specify a limit of 10 resource/sec and submit 20 resources, it takes 1.1
seconds to be able submit another request. This was covered by the
[testOverconsumption in
TestRateLimiter|https://github.com/apache/hbase/blame/587b0b4f20bdc0415b6541023e611b69c87dba15/hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestRateLimiter.java#L97].
As an incidental part of HBASE-13686, that logic was changed. There is no
mention of the reasoning behind the change in the issue comments or review
board, I think it was missed. The goal of that issue was to add different
refill strategies, but it also modified the over consumption. The
testOverconsumption was [split out for both refill
strategies|https://github.com/apache/hbase/blame/master/hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestRateLimiter.java#L104-L159],
but the core reasoning was lost. The comment says:
{code:java}
// 10 resources are available, but we need to consume 20 resources109
// Verify that we have to wait at least 1.1sec to have 1 resource available
{code}
But the actual test was updated to only require a new resource after 100ms.
This is incorrect.
The problem is, when consuming if you go negative it sets to 0
[here|https://github.com/apache/hbase/blame/master/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/RateLimiter.java#L187-L191].
Additionally, when refilling the new logic does a Math.max(0, refillAmount):
[here|https://github.com/apache/hbase/blame/master/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/RateLimiter.java#L159-L163].
So it's really impossible to get below 0, which is impractical for a rate
limiter.
With this setup it's very easy to drastically overconsume the rate limiter. See
attached screenshot, which shows two humps. The first one has the current
logic, the second hump has my fix which removes both of those problems. The
rate limit was set to 500mb/s, but I was easily able to go over 700 mb/s
without the fix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)