Ray Mattingly created HBASE-28963:
-------------------------------------

             Summary: Updating "Table Machine Quota Factors" is too expensive
                 Key: HBASE-28963
                 URL: https://issues.apache.org/jira/browse/HBASE-28963
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.6.1
            Reporter: Ray Mattingly
            Assignee: Ray Mattingly


My company is running Quotas across a few hundred clusters of varied size. One 
cluster has hundreds of servers and tens of thousands of regions. We noticed 
that the HMaster was quite busy for this cluster, and after some investigation 
we realized that RegionServers were hammering the HMaster's ClusterMetrics 
endpoint to facilitate the refreshing of table machine quota factors.

There are a few things that we could do here — in a perfect world, I think the 
RegionServers would have a better P2P communication of the region states, and 
whatever else, that is necessary to derive new quota factors. Relying solely on 
the HMaster for this coordination creates a tricky bottleneck for the 
horizontal scalability of clusters.

That said, I think that a simpler and preferable initial step would be to make 
our code a bit more cost conscious. At my company, for example, we don't even 
define any table-scoped quotas. Without any table scoped quotas in the cache, 
our cache could be much more thoughtful about the work that it chooses to do on 
each refresh. So I'm proposing that we check [the size of the tableQuotaCache 
keyset|https://github.com/apache/hbase/blob/db3ba44a4c692d26e70b6030fc519e92fd79f638/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java#L418]
 earlier, and use this inference to determine what ClusterMetrics we bother to 
fetch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to