Ray Mattingly created HBASE-28672:
-------------------------------------

             Summary: Large batch requests can be blocked indefinitely by quotas
                 Key: HBASE-28672
                 URL: https://issues.apache.org/jira/browse/HBASE-28672
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 2.6.0
            Reporter: Ray Mattingly


At my day job we are trying to implement default quotas for a variety of access 
patterns. We began by introducing a default read IO limit per-user, per-machine 
— this has been very successful in reducing hotspots, even on clusters with 
thousands of distinct users.

While implementing a default writes/second throttle, I realized that doing so 
would put us in a precarious situation where large-enough batches may never 
succeed. If your batch size is greater than your TimeLimiter's max throughput, 
then you will always fail in the quota estimation stage. Meanwhile [IO 
estimates are more 
optimistic|https://github.com/apache/hbase/blob/bdb3f216e864e20eb2b09352707a751a5cf7460f/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/DefaultOperationQuota.java#L192-L193],
 deliberately, which can let large requests do targeted oversubscription of an 
IO quota:

 
{code:java}
// assume 1 block required for reads. this is probably a low estimate, which is 
okay
readConsumed = numReads > 0 ? blockSizeBytes : 0;{code}
 

This is okay because the Limiter's availability will go negative and force a 
longer backoff on subsequent requests. I believe this is preferable UX compared 
to a doomed throttling loop.

In my opinion, we should do something similar in batch request estimation, by 
estimating a batch request's workload at {{Math.min(batchSize, 
limiterMaxThroughput)}} rather than simply {{{}batchSize{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to