Make tcmalloc heap sampling more useful

Previously our /pprof/heap path required that the process be started
with lifetime heap profiling enabled. This mode of operation is very
slow so we would never be able to use it in production. Even in tests
it's of limited utility.

The alternative is tcmalloc heap _sampling_ which only records periodic
allocations. According to recent mailing list threads, Google runs most
of their binaries with this enabled in production, so it's designed to
be very low overhead and potentially useful for us in the field.

This doesn't go so far as to enable it by default, but it does adjust
the /pprof/heap endpoint to dump the sampled allocations instead of
starting a profile. Sampling can be enabled using either the tcmalloc
env variable (TCMALLOC_SAMPLE_PARAMETER) or a new command line flag

I tested this manually by running a kudu tserver with loadgen and
sampling set to 500kb. I then used pprof to grab the heap sample
information which looked like I expected it to (most of the memory from

Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley <>


Branch: refs/heads/master
Commit: 15f49edd64bdb7b1714287c5e3f8af59eb1593a6
Parents: b0dc9ea
Author: Todd Lipcon <>
Authored: Thu Feb 8 18:22:52 2018 -0800
Committer: Todd Lipcon <>
Committed: Thu Feb 22 21:02:08 2018 +0000

 docs/troubleshooting.adoc              | 34 ++++++++++++++++++++++
 src/kudu/server/ | 42 +++++++++++++++------------
 src/kudu/util/                 | 45 ++++++++++++++++++++++++-----
 3 files changed, 95 insertions(+), 26 deletions(-)
diff --git a/docs/troubleshooting.adoc b/docs/troubleshooting.adoc
index 95557e3..a0a0b23 100644
--- a/docs/troubleshooting.adoc
+++ b/docs/troubleshooting.adoc
@@ -449,6 +449,40 @@ are several ways to relieve the memory pressure on Kudu:
   Generally, the recommended ratio of maintenance manager threads to data 
directories is 1:3.
 - Reduce the volume of writes flowing to Kudu on the application side.
+=== Heap Sampling
+For advanced debugging of memory usage, administrators may enable heap 
sampling on Kudu daemons.
+This allows Kudu developers to associate memory usage with the specific lines 
of code and data
+structures responsible. When reporting a bug related to memory usage or an 
apparent memory leak,
+heap profiling can give quantitative data to pinpoint the issue.
+WARNING: Heap sampling is an advanced troubleshooting technique and may cause 
+degradation or instability of the Kudu service. Currently it is not 
recommended to enable this
+in a production environment unless specifically requested by the Kudu 
development team.
+To enable heap sampling on a Kudu daemon, pass the flag 
+If heap sampling is enabled, the current sampled heap occupancy can be 
retrieved over HTTP
+by visiting `` or
+``. The output is a machine-readable 
dump of the
+stack traces with their associated heap usage.
+Rather than visiting the heap profile page directly in a web browser, it is 
+more useful to use the `pprof` tool that is distributed as part of the 
+open source project. For example, a developer with a local build tree can use 
+following command to collect the sampled heap usage and output an SVG diagram:
+thirdparty/installed/uninstrumented/bin/pprof -svg  
'http://localhost:8051/pprof/heap' > /tmp/heap.svg
+The resulting SVG may be visualized in a web browser or sent to the Kudu 
community to help
+troubleshoot memory occupancy issues.
+TIP: Heap samples contain only summary information about allocations and do 
not contain any
+_data_ from the heap. It is safe to share heap samples in public without fear 
of exposing
+confidential or sensitive data.
 === Disk Issues
diff --git a/src/kudu/server/ 
index 040daa0..d371d52 100644
--- a/src/kudu/server/
+++ b/src/kudu/server/
@@ -90,27 +90,33 @@ static void PprofHeapHandler(const Webserver::WebRequest& 
                              Webserver::PrerenderedWebResponse* resp) {
   ostringstream* output = resp->output;
-  *output << "Heap profiling is not available without tcmalloc.";
+  *output << "%warn Heap profiling is not available without tcmalloc.\n";
-  // Remote (on-demand) profiling is disabled if the process is already being 
-  if (FLAGS_enable_process_lifetime_heap_profiling) {
-    *output << "Heap profiling is running for the process lifetime.";
+  // If we've started the process with heap profiling then dump the full 
+  if (IsHeapProfilerRunning()) {
+    char* profile = GetHeapProfile();
+    *output << profile;
+    free(profile);
-  auto it = req.parsed_args.find("seconds");
-  int seconds = kPprofDefaultSampleSecs;
-  if (it != req.parsed_args.end()) {
-    seconds = atoi(it->second.c_str());
+  // Otherwise dump the sample.
+  string buf;
+  MallocExtension::instance()->GetHeapSample(&buf);
+  if (buf.find("This heap profile does not have any data") != string::npos) {
+    // If sampling is disabled, tcmalloc prints a nice message with 
+    // how to enable it. However, it only describes the environment variable 
+    // and not our own gflag-based method, so we'll replace it with our own 
+    buf = "%warn\n"
+        "%warn This heap profile does not have any data in it, because\n"
+        "%warn the application was run with heap sampling turned off.\n"
+        "%warn To obtain a heap sample, you must set the environment\n"
+        "%warn variable TCMALLOC_SAMPLE_PARAMETER or the flag\n"
+        "%warn --heap-sample-every-n-bytes to a positive sampling period,\n"
+        "%warn such as 524288.\n"
+        "%warn\n";
-  HeapProfilerStart(FLAGS_heap_profile_path.c_str());
-  // Sleep to allow for some samples to be collected.
-  SleepFor(MonoDelta::FromSeconds(seconds));
-  const char* profile = GetHeapProfile();
-  HeapProfilerStop();
-  *output << profile;
-  delete profile;
+  *output << buf;
@@ -121,7 +127,7 @@ static void PprofCpuProfileHandler(const 
Webserver::WebRequest& req,
                                    Webserver::PrerenderedWebResponse* resp) {
   ostringstream* output = resp->output;
-  *output << "CPU profiling is not available without tcmalloc.";
+  *output << "%warn CPU profiling is not available without tcmalloc.\n";
   auto it = req.parsed_args.find("seconds");
   int seconds = kPprofDefaultSampleSecs;
@@ -149,7 +155,7 @@ static void PprofCpuProfileHandler(const 
Webserver::WebRequest& req,
 static void PprofGrowthHandler(const Webserver::WebRequest& /*req*/,
                                Webserver::PrerenderedWebResponse* resp) {
-  *resp->output << "Growth profiling is not available without tcmalloc.";
+  *resp->output << "%warn Growth profiling is not available without 
   string heap_growth_stack;
diff --git a/src/kudu/util/ b/src/kudu/util/
index 2a7398f..e091d3d 100644
--- a/src/kudu/util/
+++ b/src/kudu/util/
@@ -17,7 +17,6 @@
 #include "kudu/util/flags.h"
-#include <unistd.h>
 #include <cstdlib>
 #include <functional>
@@ -28,6 +27,7 @@
 #include <vector>
 #include <sys/stat.h>
+#include <unistd.h> // IWYU pragma: keep
 #include <boost/algorithm/string/predicate.hpp>
 #include <gflags/gflags.h>
@@ -73,10 +73,10 @@ DEFINE_bool(dump_metrics_json, false,
             "by this binary.");
 TAG_FLAG(dump_metrics_json, hidden);
 DEFINE_bool(enable_process_lifetime_heap_profiling, false, "Enables heap "
     "profiling for the lifetime of the process. Profile output will be stored 
in the "
-    "directory specified by -heap_profile_path. Enabling this option will 
disable the "
-    "on-demand/remote server profile handlers.");
+    "directory specified by -heap_profile_path.");
 TAG_FLAG(enable_process_lifetime_heap_profiling, stable);
 TAG_FLAG(enable_process_lifetime_heap_profiling, advanced);
@@ -85,6 +85,16 @@ DEFINE_string(heap_profile_path, "", "Output path to store 
heap profiles. If not
 TAG_FLAG(heap_profile_path, stable);
 TAG_FLAG(heap_profile_path, advanced);
+DEFINE_int64(heap_sample_every_n_bytes, 0,
+             "Enable heap occupancy sampling. If this flag is set to some 
positive "
+             "value N, a memory allocation will be sampled approximately every 
N bytes. "
+             "Lower values of N incur larger overhead but give more accurate 
results. "
+             "A value such as 524288 (512KB) is a reasonable choice with 
relatively "
+             "low overhead.");
+TAG_FLAG(heap_sample_every_n_bytes, advanced);
+TAG_FLAG(heap_sample_every_n_bytes, experimental);
 DEFINE_bool(disable_core_dumps, false, "Disable core dumps when this process 
 TAG_FLAG(disable_core_dumps, advanced);
 TAG_FLAG(disable_core_dumps, evolving);
@@ -319,6 +329,16 @@ TAG_FLAG(helpxml, advanced);
 TAG_FLAG(version, stable);
+// TCMalloc flags.
+// These are tricky because tcmalloc doesn't use gflags. So we have to
+// reach into its internal namespace.
+namespace TCM_NAMESPACE {
+extern int64_t FLAGS_tcmalloc_sample_parameter;
+} // namespace TCM_NAMESPACE
 namespace kudu {
 // After flags have been parsed, the umask value is filled in here.
@@ -486,11 +506,6 @@ void HandleCommonFlags() {
-  if (FLAGS_heap_profile_path.empty()) {
-    FLAGS_heap_profile_path = strings::Substitute(
-        "/tmp/$0.$1", google::ProgramInvocationShortName(), getpid());
-  }
   if (FLAGS_disable_core_dumps) {
@@ -498,9 +513,23 @@ void HandleCommonFlags() {
+  if (FLAGS_heap_profile_path.empty()) {
+    FLAGS_heap_profile_path = strings::Substitute(
+        "/tmp/$0.$1", google::ProgramInvocationShortName(), getpid());
+  }
   if (FLAGS_enable_process_lifetime_heap_profiling) {
+  // Set the internal tcmalloc flag unless it was already set using the 
+  // environment-variable-based method. It doesn't appear that this is settable
+  // in any less hacky fashion.
+  if (!getenv("TCMALLOC_SAMPLE_PARAMETER")) {
+    TCM_NAMESPACE::FLAGS_tcmalloc_sample_parameter = 
+  } else if 
(!google::GetCommandLineFlagInfoOrDie("heap_sample_every_n_bytes").is_default) {
+    LOG(ERROR) << "Heap sampling configured using both 
--heap-sample-every-n-bytes and "
+               << "TCMALLOC_SAMPLE_PARAMETER. Ignoring command line flag.";
+  }

Reply via email to