Repository: kudu
Updated Branches:
  refs/heads/master 42db87b0b -> d4ded71bc


[docs] Add basic advice on setting block cache size

This adds a short section to the troubleshooting guide about improving
the performance of the block cache. It's fuzzy since the
effectiveness of the cache and the efficacy of enlarging it are so
workload dependent (e.g. consider a workload doing full table scans vs.
one mostly re-scanning a small range checking for updates), but I tried
to provide a starting point for users to evaluate their cache size since
we've totally lacked any advice on that up to this point.

I also added information about the change due to release in 1.8 that
servers won't start when the block cache capacity is set too large
relative to the memory limit.

Change-Id: Idc7411c38b6fcc8694509ec89c32e2fe74e6c0db
Reviewed-on: http://gerrit.cloudera.org:8080/11420
Reviewed-by: Adar Dembo <a...@cloudera.com>
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/c36a9fe9
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/c36a9fe9
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/c36a9fe9

Branch: refs/heads/master
Commit: c36a9fe903a4705012d1205206401bf9101803d4
Parents: 42db87b
Author: Will Berkeley <wdberke...@gmail.org>
Authored: Tue Sep 11 10:35:53 2018 -0700
Committer: Will Berkeley <wdberke...@gmail.com>
Committed: Mon Sep 17 17:41:27 2018 +0000

----------------------------------------------------------------------
 docs/troubleshooting.adoc | 87 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 78 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/c36a9fe9/docs/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/docs/troubleshooting.adoc b/docs/troubleshooting.adoc
index 90791d2..0f37913 100644
--- a/docs/troubleshooting.adoc
+++ b/docs/troubleshooting.adoc
@@ -532,15 +532,84 @@ are several ways to relieve the memory pressure on Kudu:
   Generally, the recommended ratio of maintenance manager threads to data 
directories is 1:3.
 - Reduce the volume of writes flowing to Kudu on the application side.
 
-Finally, check the value of `--block_cache_capacity_mb`. This setting 
determines
-the maximum size of Kudu's block cache. While a higher value can help with read
-and write performance, setting it too high (as a percentage of 
`--memory_limit_hard_bytes`)
-is harmful. Do not raise `--block_cache_capacity_mb` above 
`--memory_pressure_percentage`
-(default 60%) of `--memory_limit_hard_bytes`, as this will cause Kudu to flush
-aggressively even if write throughput is low. Keeping the block cache capacity
-below 50% of the memory pressure percentage times the hard limit is 
recommended.
-With the defaults, this means the `--block_cache_capacity_mb` should not exceed
-30% of `--memory_limit_hard_bytes`.
+Finally, on versions of Kudu prior to 1.8, check the value of
+`--block_cache_capacity_mb`. This setting determines the maximum size of Kudu's
+block cache. While a higher value can help with read and write performance,
+do not raise `--block_cache_capacity_mb` above the memory pressure threshold,
+which is `--memory_pressure_percentage` (default 60%) of
+`--memory_limit_hard_bytes`, as this will cause Kudu to flush aggressively even
+if write throughput is low. Keeping `--block_cache_capacity_mb` below 50% of 
the
+memory pressure threshold is recommended. With the defaults, this means
+`--block_cache_capacity_mb` should not exceed 30% of
+`--memory_limit_hard_bytes`. On Kudu 1.8 and higher, servers will refuse to
+start if the block cache capacity exceeds the memory pressure threshold.
+
+[[block_cache_size]]
+=== Block Cache Size
+
+Kudu uses an LRU cache for recently read data. On workloads that scan a subset
+of the data repeatedly, raising the size of this cache can offer significant
+performance benefits. To increase the amount of memory dedicated to the block
+cache, increase the value of the flag `--block_cache_capacity_mb`. The default
+is 512MiB.
+
+Kudu provides a set of useful metrics for evaluating the performance of the
+block cache, which can be found on the `/metrics` endpoint of the web UI. An
+example set:
+
+[source,json]
+----
+{
+  "name": "block_cache_inserts",
+  "value": 64
+},
+{
+  "name": "block_cache_lookups",
+  "value": 512
+},
+{
+  "name": "block_cache_evictions",
+  "value": 0
+},
+{
+  "name": "block_cache_misses",
+  "value": 96
+},
+{
+  "name": "block_cache_misses_caching",
+  "value": 64
+},
+{
+  "name": "block_cache_hits",
+  "value": 0
+},
+{
+  "name": "block_cache_hits_caching",
+  "value": 352
+},
+{
+  "name": "block_cache_usage",
+  "value": 6976
+}
+----
+
+To judge the efficiency of the block cache on a tablet server, first wait until
+the server has been running and serving normal requests for some time, so the
+cache is not cold. Unless the server stores very little data or is idle,
+`block_cache_usage` should be equal or nearly equal to 
`block_cache_capacity_mb`.
+Once the cache has reached steady state, compare `block_cache_lookups` to
+`block_cache_misses_caching`. The latter metric counts the number of blocks 
that
+Kudu expected to read from cache but which weren't found in the cache. If a
+significant amount of lookups result in misses on expected cache hits, and the
+`block_cache_evictions` metric is significant compared to 
`block_cache_inserts`,
+then raising the size of the block cache may provide a performance boost.
+However, the utility of the block cache is highly dependent on workload, so 
it's
+necessary to test the benefits of a larger block cache.
+
+WARNING: Do not raise the block cache size `--block_cache_capacity_mb` higher
+than the memory pressure threshold (defaults to 60% of 
`--memory_limit_hard_bytes`).
+As this would cause poor flushing behavior, Kudu servers version 1.8 and higher
+will refuse to start when misconfigured in this way.
 
 [[heap_sampling]]
 === Heap Sampling

Reply via email to