This is an automated email from the ASF dual-hosted git repository.

wchevreuil pushed a commit to branch HBASE-27389-rebase
in repository https://gitbox.apache.org/repos/asf/hbase.git

commit 591e01b2dd8eed2ab05c5770f17ded342446a5ac
Author: Rahul Agarkar <[email protected]>
AuthorDate: Thu Nov 2 22:15:38 2023 +0530

    HBASE-28097 Add documentation section for the Cache Aware balancer fu… 
(#5495)
    
    Signed-off-by: Wellington Chevreuil <[email protected]>
---
 src/main/asciidoc/_chapters/architecture.adoc | 43 +++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/src/main/asciidoc/_chapters/architecture.adoc 
b/src/main/asciidoc/_chapters/architecture.adoc
index 23d069c1d91..12bdc09ac76 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -1130,6 +1130,49 @@ For a RegionServer hosting data that can comfortably fit 
into cache, or if your
 
 The compressed BlockCache is disabled by default. To enable it, set 
`hbase.block.data.cachecompressed` to `true` in _hbase-site.xml_ on all 
RegionServers.
 
+==== Cache Aware Load Balancer
+
+Depending on the data size and the configured cache size, the cache warm up 
can take anywhere from a few minutes to a few hours. This becomes even more 
critical for HBase deployments over cloud storage, where compute is separated 
from storage. Doing this everytime the region server starts can be a very 
expensive process. To eliminate this, 
link:https://issues.apache.org/jira/browse/HBASE-27313[HBASE-27313] implemented 
the cache persistence feature where the region servers periodically pe [...]
+
+link:https://issues.apache.org/jira/browse/HBASE-27999[HBASE-27999] implements 
the cache aware load balancer, which adds to the load balancer the ability to 
consider the cache allocation of each region on region servers when calculating 
a new assignment plan, using the region/region server cache allocation 
information reported by region servers to calculate the percentage of HFiles 
cached for each region on the hosting server. This information is then used by 
the balancer as a factor whe [...]
+
+The master node captures the caching information from all the region servers 
and uses this information to decide on new region assignments while ensuring a 
minimal impact on the current cache allocation. A region is assigned to the 
region server where it has a better cache ratio as compared to the region 
server where it is currently hosted.
+
+The CacheAwareLoadBalancer uses two cost elements for deciding the region 
allocation. These are described below:
+
+. Cache Cost
++
+
+The cache cost is calculated as the percentage of data for a region cached on 
the region server where it is either currently hosted or was previously hosted. 
A region may have multiple HFiles, each of different sizes. A HFile is 
considered to be fully prefetched when all the data blocks in this file are in 
the cache. The region server hosting this region calculates the ratio of number 
of HFiles fully cached in the cache to the total number of HFiles in the 
region. This ratio will vary fr [...]
++
+Every region server maintains this information for all the regions currently 
hosted there. In addition to that, this cache ratio is also maintained for the 
regions which were previously hosted on this region server giving historical 
information about the regions.
+
+. Skewness Cost
++
+
+
+The cache aware balancer will consider cache cost with the skewness cost to 
decide on the region assignment plan under following conditions:
+
+. There is an idle server in the cluster. This can happen when an existing 
server is restarted or a new server is added to the cluster.
+
+. When the cost of maintaining the balance in the cluster is greater than the 
minimum threshold defined by the configuration 
_hbase.master.balancer.stochastic.minCostNeedBalance_.
+
+
+The CacheAwareLoadBalancer can be enabled in the cluster by setting the 
following configuration properties in the master master configuration:
+
+[source,xml]
+----
+<property>
+  <name>hbase.master.loadbalancer.class</name>
+  <value>org.apache.hadoop.hbase.master.balancer.CacheAwareLoadBalancer</value>
+</property>
+<property>
+  <name>hbase.bucketcache.persistent.path</name>
+  <value>/path/to/bucketcache_persistent_file</value>
+</property>
+----
+
+
 [[regionserver_splitting_implementation]]
 === RegionServer Splitting Implementation
 

Reply via email to