[1/2] incubator-impala git commit: IMPALA-5220: memory maintenance cleanup

mjacobs Wed, 10 May 2017 20:32:47 -0700

Repository: incubator-impala
Updated Branches:
  refs/heads/master e7cb80b66 -> 060b80fd9



IMPALA-5220: memory maintenance cleanup

Remove logic that tries to release pages from TcMalloc's page heap:
TCMalloc's behaviour changed so that it automatically does this with
"aggressive decommit" and committed spans can't accumulate in the page
heap.

Also increase the memory maintenance interval - 1s is too aggressive and
can free memory that will be imminently reused by a running query, e.g.
particularly buffer pool buffers.

Testing:
Tried to reproduce IMPALA-2800 in a couple of ways:
* Ran a big aggregation locally and cancelled it
* Looked at memz/ of some live clusters (production and stress test).
In all cases "Bytes in page heap freelist" was 0.

This confirms that IMPALA-2800 was already solved, probably
by the gperftools 2.5 upgrade, where aggressive decommit
would mean that memory is released to the system in
free() instead of the ReleaseFreeMemory() callst.

I was able to confirm that the ReleaseFreeMemory() calls were unnecessary
to avoid retaining memory by running a couple of stress tests
locally with this patch and checking that "Bytes in page
heap freelist" was 0 after the change and that memory consumption
was generally sensible.

Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701
Reviewed-on: http://gerrit.cloudera.org:8080/6626
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/84110fc9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/84110fc9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/84110fc9

Branch: refs/heads/master
Commit: 84110fc90b1d51410a28365d13d03f4ba1aec6fe
Parents: e7cb80b
Author: Tim Armstrong <[email protected]>
Authored: Wed Apr 12 17:36:08 2017 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Thu May 11 00:45:03 2017 +0000

----------------------------------------------------------------------
 be/src/common/init.cc         | 38 +++++++-------------------------------
 be/src/runtime/exec-env.cc    | 21 ++++++++-------------
 be/src/runtime/mem-tracker.cc | 20 +-------------------
 be/src/runtime/mem-tracker.h  | 34 ++++++++--------------------------
 be/src/util/memory-metrics.h  |  1 +
 5 files changed, 25 insertions(+), 89 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/common/init.cc
----------------------------------------------------------------------
diff --git a/be/src/common/init.cc b/be/src/common/init.cc
index 4c90fa3..19b2755 100644
--- a/be/src/common/init.cc
+++ b/be/src/common/init.cc
@@ -40,6 +40,7 @@
 #include "util/disk-info.h"
 #include "util/logging-support.h"
 #include "util/mem-info.h"
+#include "util/memory-metrics.h"
 #include "util/minidump.h"
 #include "util/network-util.h"
 #include "util/openssl-util.h"
@@ -69,7 +70,7 @@ DEFINE_int32(max_audit_event_log_files, 0, "Maximum number of 
audit event log fi
     "to retain. The most recent audit event log files are retained. If set to 
0, "
     "all audit event log files are retained.");
 
-DEFINE_int32(memory_maintenance_sleep_time_ms, 1000, "Sleep time in 
milliseconds "
+DEFINE_int32(memory_maintenance_sleep_time_ms, 10000, "Sleep time in 
milliseconds "
     "between memory maintenance iterations");
 
 DEFINE_int64(pause_monitor_sleep_time_ms, 500, "Sleep time in milliseconds for 
"
@@ -89,11 +90,6 @@ DEFINE_string(local_library_dir, "/tmp",
 // in ways the authors of redaction rules can't anticipate.
 DECLARE_string(vmodule);
 
-// tcmalloc will hold on to freed memory. We will periodically release the 
memory back
-// to the OS if the extra memory is too high. If the memory used by the 
application
-// is less than this fraction of the total reserved memory, free it back to 
the OS.
-static const float TCMALLOC_RELEASE_FREE_MEMORY_FRACTION = 0.5f;
-
 using std::string;
 
 // Log maintenance thread that runs periodically. It flushes glog every 
logbufsecs sec.
@@ -137,36 +133,16 @@ static scoped_ptr<impala::Thread> pause_monitor;
     if (buffer_pool != nullptr) buffer_pool->Maintenance();
 
 #ifndef ADDRESS_SANITIZER
-    // Required to ensure memory gets released back to the OS, even if 
tcmalloc doesn't do
-    // it for us. This is because tcmalloc releases memory based on the
-    // TCMALLOC_RELEASE_RATE property, which is not actually a rate but a 
divisor based
-    // on the number of blocks that have been deleted. When tcmalloc does 
decide to
-    // release memory, it removes a single span from the PageHeap. This means 
there are
-    // certain allocation patterns that can lead to OOM due to not enough 
memory being
-    // released by tcmalloc, even when that memory is no longer being used.
-    // One example is continually resizing a vector which results in many 
allocations.
-    // Even after the vector goes out of scope, all the memory will not be 
released
-    // unless there are enough other deletions that are occurring in the 
system.
-    // This can eventually lead to OOM/crashes (see IMPALA-818).
-    // See: 
http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html#runtime
-    size_t bytes_used = 0;
-    size_t bytes_in_pageheap = 0;
-    MallocExtension::instance()->GetNumericProperty(
-        "generic.current_allocated_bytes", &bytes_used);
-    MallocExtension::instance()->GetNumericProperty(
-        "generic.heap_size", &bytes_in_pageheap);
-    if (bytes_used < bytes_in_pageheap * 
TCMALLOC_RELEASE_FREE_MEMORY_FRACTION) {
-      MallocExtension::instance()->ReleaseFreeMemory();
-    }
-
     // When using tcmalloc, the process limit as measured by our trackers will
-    // be out of sync with the process usage. Update the process tracker 
periodically.
+    // be out of sync with the process usage. The metric is refreshed whenever
+    // memory is consumed or released via a MemTracker, so on a system with
+    // queries executing it will be refreshed frequently. However if the system
+    // is idle, we need to refresh the tracker occasionally since untracked
+    // memory may be allocated or freed, e.g. by background threads.
     if (env != NULL && env->process_mem_tracker() != NULL) {
       env->process_mem_tracker()->RefreshConsumptionFromMetric();
     }
 #endif
-    // TODO: we should also update the process mem tracker with the reported 
JVM
-    // mem usage.
   }
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/runtime/exec-env.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/exec-env.cc b/be/src/runtime/exec-env.cc
index 6eeeaee..d98b9d7 100644
--- a/be/src/runtime/exec-env.cc
+++ b/be/src/runtime/exec-env.cc
@@ -315,6 +315,14 @@ Status ExecEnv::StartServices() {
   // Limit of -1 means no memory limit.
   mem_tracker_.reset(new MemTracker(
       AggregateMemoryMetric::TOTAL_USED, bytes_limit > 0 ? bytes_limit : -1, 
"Process"));
+  // Aggressive decommit is required so that unused pages in the TCMalloc page 
heap are
+  // not backed by physical pages and do not contribute towards memory 
consumption.
+  size_t aggressive_decommit_enabled = 0;
+  MallocExtension::instance()->GetNumericProperty(
+      "tcmalloc.aggressive_memory_decommit", &aggressive_decommit_enabled);
+  if (!aggressive_decommit_enabled) {
+    return Status("TCMalloc aggressive decommit is required but is disabled.");
+  }
 #else
   // tcmalloc metrics aren't defined in ASAN builds, just use the default 
behavior to
   // track process memory usage (sum of all children trackers).
@@ -337,19 +345,6 @@ Status ExecEnv::StartServices() {
   mem_tracker_->AddGcFunction(
       [this](int64_t bytes_to_free) { 
disk_io_mgr_->GcIoBuffers(bytes_to_free); });
 
-  // TODO: IMPALA-3200: register BufferPool::ReleaseMemory() as GC function.
-
-#ifndef ADDRESS_SANITIZER
-  // Since tcmalloc does not free unused memory, we may exceed the process mem 
limit even
-  // if Impala is not actually using that much memory. Add a callback to free 
any unused
-  // memory if we hit the process limit. TCMalloc GC must run last, because 
other GC
-  // functions may have released memory to TCMalloc, and TCMalloc may have 
cached it
-  // instead of releasing it to the system.
-  mem_tracker_->AddGcFunction([](int64_t bytes_to_free) {
-    MallocExtension::instance()->ReleaseToSystem(bytes_to_free);
-  });
-#endif
-
   // Start services in order to ensure that dependencies between them are met
   if (enable_webserver_) {
     AddDefaultUrlCallbacks(webserver_.get(), mem_tracker_.get(), 
metrics_.get());

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/runtime/mem-tracker.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/mem-tracker.cc b/be/src/runtime/mem-tracker.cc
index d2dceb3..bd2f039 100644
--- a/be/src/runtime/mem-tracker.cc
+++ b/be/src/runtime/mem-tracker.cc
@@ -39,8 +39,6 @@ namespace impala {
 
 const string MemTracker::COUNTER_NAME = "PeakMemoryUsage";
 
-AtomicInt64 MemTracker::released_memory_since_gc_;
-
 // Name for request pool MemTrackers. '$0' is replaced with the pool name.
 const string REQUEST_POOL_MEM_TRACKER_LABEL_FORMAT = "RequestPool=$0";
 
@@ -209,12 +207,6 @@ void MemTracker::RegisterMetrics(MetricGroup* metrics, 
const string& prefix) {
   limit_metric_ = metrics->AddGauge<int64_t>(Substitute("$0.limit", prefix), 
limit_);
 }
 
-void MemTracker::RefreshConsumptionFromMetric() {
-  DCHECK(consumption_metric_ != NULL);
-  DCHECK(parent_ == NULL);
-  consumption_->Set(consumption_metric_->value());
-}
-
 // Calling this on the query tracker results in output like:
 //
 //  Query(4a4c81fedaed337d:4acadfda00000000) Limit=10.00 GB Total=508.28 MB 
Peak=508.45 MB
@@ -345,7 +337,7 @@ void MemTracker::AddGcFunction(GcFunction f) {
 bool MemTracker::GcMemory(int64_t max_consumption) {
   if (max_consumption < 0) return true;
   lock_guard<mutex> l(gc_lock_);
-  if (consumption_metric_ != NULL) 
consumption_->Set(consumption_metric_->value());
+  if (consumption_metric_ != NULL) RefreshConsumptionFromMetric();
   int64_t pre_gc_consumption = consumption();
   // Check if someone gc'd before us
   if (pre_gc_consumption < max_consumption) return false;
@@ -370,14 +362,4 @@ bool MemTracker::GcMemory(int64_t max_consumption) {
   }
   return curr_consumption > max_consumption;
 }
-
-void MemTracker::GcTcmalloc() {
-#ifndef ADDRESS_SANITIZER
-  released_memory_since_gc_.Store(0);
-  MallocExtension::instance()->ReleaseFreeMemory();
-#else
-  // Nothing to do if not using tcmalloc.
-#endif
-}
-
 }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/runtime/mem-tracker.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/mem-tracker.h b/be/src/runtime/mem-tracker.h
index f962fa0..2edb658 100644
--- a/be/src/runtime/mem-tracker.h
+++ b/be/src/runtime/mem-tracker.h
@@ -195,13 +195,8 @@ class MemTracker {
       return;
     }
 
-    if (UNLIKELY(released_memory_since_gc_.Add(bytes) > GC_RELEASE_SIZE)) {
-      GcTcmalloc();
-    }
-
     if (consumption_metric_ != NULL) {
-      DCHECK(parent_ == NULL);
-      consumption_->Set(consumption_metric_->value());
+      RefreshConsumptionFromMetric();
       return;
     }
     for (std::vector<MemTracker*>::iterator tracker = all_trackers_.begin();
@@ -258,9 +253,13 @@ class MemTracker {
     return result;
   }
 
-  /// Refresh the value of consumption_. Only valid to call if 
consumption_metric_ is not
-  /// null.
-  void RefreshConsumptionFromMetric();
+  /// Refresh the memory consumption value from the consumption metric. Only 
valid to
+  /// call if this tracker has a consumption metric.
+  void RefreshConsumptionFromMetric() {
+    DCHECK(consumption_metric_ != nullptr);
+    DCHECK(parent_ == nullptr);
+    consumption_->Set(consumption_metric_->value());
+  }
 
   int64_t limit() const { return limit_; }
   bool has_limit() const { return limit_ >= 0; }
@@ -336,12 +335,6 @@ class MemTracker {
   /// gc_lock. Updates metrics if initialized.
   bool GcMemory(int64_t max_consumption);
 
-  /// Called when the total release memory is larger than GC_RELEASE_SIZE.
-  /// TcMalloc holds onto released memory and very slowly (if ever) releases 
it back to
-  /// the OS. This is problematic since it is memory we are not constantly 
tracking which
-  /// can cause us to go way over mem limits.
-  void GcTcmalloc();
-
   /// Walks the MemTracker hierarchy and populates all_trackers_ and
   /// limit_trackers_
   void Init();
@@ -354,17 +347,6 @@ class MemTracker {
   static std::string LogUsage(const std::string& prefix,
       const std::list<MemTracker*>& trackers, int64_t* logged_consumption);
 
-  /// Size, in bytes, that is considered a large value for Release() (or 
Consume() with
-  /// a negative value). If tcmalloc is used, this can trigger it to GC.
-  /// A higher value will make us call into tcmalloc less often (and therefore 
more
-  /// efficient). A lower value will mean our memory overhead is lower.
-  /// TODO: this is a stopgap.
-  static const int64_t GC_RELEASE_SIZE = 128 * 1024L * 1024L;
-
-  /// Total amount of memory from calls to Release() since the last GC. If this
-  /// is greater than GC_RELEASE_SIZE, this will trigger a tcmalloc gc.
-  static AtomicInt64 released_memory_since_gc_;
-
   /// Lock to protect GcMemory(). This prevents many GCs from occurring at 
once.
   boost::mutex gc_lock_;
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/util/memory-metrics.h
----------------------------------------------------------------------
diff --git a/be/src/util/memory-metrics.h b/be/src/util/memory-metrics.h
index 83de52f..2527735 100644
--- a/be/src/util/memory-metrics.h
+++ b/be/src/util/memory-metrics.h
@@ -40,6 +40,7 @@ class AggregateMemoryMetric {
   /// Approximates the total amount of physical memory consumed by the backend 
(i.e. not
   /// including JVM memory), which is either in use by queries or cached by 
the BufferPool
   /// or TcMalloc. NULL when running under ASAN.
+  /// TODO: IMPALA-691 - consider changing this to include JVM memory.
   static SumGauge<uint64_t>* TOTAL_USED;
 };

[1/2] incubator-impala git commit: IMPALA-5220: memory maintenance cleanup

Reply via email to