Ray Mattingly created HBASE-28672: ------------------------------------- Summary: Large batch requests can be blocked indefinitely by quotas Key: HBASE-28672 URL: https://issues.apache.org/jira/browse/HBASE-28672 Project: HBase Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Mattingly
At my day job we are trying to implement default quotas for a variety of access patterns. We began by introducing a default read IO limit per-user, per-machine — this has been very successful in reducing hotspots, even on clusters with thousands of distinct users. While implementing a default writes/second throttle, I realized that doing so would put us in a precarious situation where large-enough batches may never succeed. If your batch size is greater than your TimeLimiter's max throughput, then you will always fail in the quota estimation stage. Meanwhile [IO estimates are more optimistic|https://github.com/apache/hbase/blob/bdb3f216e864e20eb2b09352707a751a5cf7460f/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/DefaultOperationQuota.java#L192-L193], deliberately, which can let large requests do targeted oversubscription of an IO quota: {code:java} // assume 1 block required for reads. this is probably a low estimate, which is okay readConsumed = numReads > 0 ? blockSizeBytes : 0;{code} This is okay because the Limiter's availability will go negative and force a longer backoff on subsequent requests. I believe this is preferable UX compared to a doomed throttling loop. In my opinion, we should do something similar in batch request estimation, by estimating a batch request's workload at {{Math.min(batchSize, limiterMaxThroughput)}} rather than simply {{{}batchSize{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)