rmdmattingly opened a new pull request, #5773:
URL: https://github.com/apache/hbase/pull/5773

   See https://issues.apache.org/jira/browse/HBASE-28453
   
   The AverageIntervalRateLimiter causes tiny wait intervals which can result 
in DDOS at worst, or poor UX at best. See below where we implemented a 10k 
request/second/machine quota at 10:43 and saw requests fall into immediate 
retry loops and inundate the cluster:
   
![](https://issues.apache.org/jira/secure/attachment/13067588/Screenshot%202024-03-21%20at%202.30.01%E2%80%AFPM.png)
   
   
   The FixedIntervalRateLimiter causes large wait intervals which make it 
difficult to fully utilize a quota. See below where we were stuck only 
utilizing ~20% of our quota consistently. After restarting the cluster to use a 
FIRL with a 100ms refill interval we were able to achieve much better 
utilization:
   
   ![Screenshot 2024-03-22 at 9 42 52 
AM](https://github.com/apache/hbase/assets/21689053/0c813e41-27ae-4227-8521-6e4a25b0c225)
   
   
   As suggested above, this PR introduces support for a refill interval that is 
<= the TimeUnit of a FixedIntervalRateLimiter. This means that you can define a 
quota in a straightforward way, like 100MB/sec, while also acknowledging, for 
example, that you're willing to refill it every 100ms — suggesting that your 
retries for small/normal requests will often be ~100ms.
   
   Simply set `hbase.quota.rate.limiter.refill.interval.ms` to your desired 
refill interval, and restart your RegionServers, to make use of this feature. 
By default the refill interval will just equal the TimeUnit, so this is a no-op 
without explicit configuration.
   
   @hgromer @eab148 @bozzkar @bbeaudreault 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to