Andrew Kyle Purtell created HBASE-25624:
-------------------------------------------
Summary: Bound LoadBalancer's RegionLocationFinder cache
Key: HBASE-25624
URL: https://issues.apache.org/jira/browse/HBASE-25624
Project: HBase
Issue Type: Bug
Components: Balancer, master, Operability
Affects Versions: 2.4.1, 1.6.0
Reporter: Andrew Kyle Purtell
Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3
We have a large table in production that causes the balancer's
RegionLocationFinder cache to consume 4 GB of heap, which, among other factors,
triggered OOMEs, and made us aware of this problem.
RegionLocationFinder embeds a cache backed by Guava's CacheLoader. The
RegionLocationFinder cache comes to consume heap for RegionInfos for all table
regions and all HDFS block locations of all store files for all regions of all
tables.
The only limit we pass to the CacheBuilder is an expiration time of 14400000
milliseconds for individual cache entries. That's 4 hours. That's much too
long; however, the cache also periodically refreshes itself, where the need for
a refresh is checked whenever BaseLoadBalancer calls RegionLocationFinder's
setClusterMetrics() method, which defeats the expiration based limit anyway.
We should be bounding this cache with effective resource controls. Time based
expiry is fine but the periodic refresh logic must be removed to make it
effective. Implement size based limits too. CacheBuilder#maximumSize will limit
by number cache entries. This might be fine but CacheBuilder#maximumWeight
would be better, where weight is something determined by the API user. In this
case it can be an estimate of the heap size of the hash map entries kept in the
cache.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)