Repository: incubator-impala Updated Branches: refs/heads/master e7cb80b66 -> 060b80fd9
IMPALA-5220: memory maintenance cleanup Remove logic that tries to release pages from TcMalloc's page heap: TCMalloc's behaviour changed so that it automatically does this with "aggressive decommit" and committed spans can't accumulate in the page heap. Also increase the memory maintenance interval - 1s is too aggressive and can free memory that will be imminently reused by a running query, e.g. particularly buffer pool buffers. Testing: Tried to reproduce IMPALA-2800 in a couple of ways: * Ran a big aggregation locally and cancelled it * Looked at memz/ of some live clusters (production and stress test). In all cases "Bytes in page heap freelist" was 0. This confirms that IMPALA-2800 was already solved, probably by the gperftools 2.5 upgrade, where aggressive decommit would mean that memory is released to the system in free() instead of the ReleaseFreeMemory() callst. I was able to confirm that the ReleaseFreeMemory() calls were unnecessary to avoid retaining memory by running a couple of stress tests locally with this patch and checking that "Bytes in page heap freelist" was 0 after the change and that memory consumption was generally sensible. Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Reviewed-on: http://gerrit.cloudera.org:8080/6626 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/84110fc9 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/84110fc9 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/84110fc9 Branch: refs/heads/master Commit: 84110fc90b1d51410a28365d13d03f4ba1aec6fe Parents: e7cb80b Author: Tim Armstrong <[email protected]> Authored: Wed Apr 12 17:36:08 2017 -0700 Committer: Impala Public Jenkins <[email protected]> Committed: Thu May 11 00:45:03 2017 +0000 ---------------------------------------------------------------------- be/src/common/init.cc | 38 +++++++------------------------------- be/src/runtime/exec-env.cc | 21 ++++++++------------- be/src/runtime/mem-tracker.cc | 20 +------------------- be/src/runtime/mem-tracker.h | 34 ++++++++-------------------------- be/src/util/memory-metrics.h | 1 + 5 files changed, 25 insertions(+), 89 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/common/init.cc ---------------------------------------------------------------------- diff --git a/be/src/common/init.cc b/be/src/common/init.cc index 4c90fa3..19b2755 100644 --- a/be/src/common/init.cc +++ b/be/src/common/init.cc @@ -40,6 +40,7 @@ #include "util/disk-info.h" #include "util/logging-support.h" #include "util/mem-info.h" +#include "util/memory-metrics.h" #include "util/minidump.h" #include "util/network-util.h" #include "util/openssl-util.h" @@ -69,7 +70,7 @@ DEFINE_int32(max_audit_event_log_files, 0, "Maximum number of audit event log fi "to retain. The most recent audit event log files are retained. If set to 0, " "all audit event log files are retained."); -DEFINE_int32(memory_maintenance_sleep_time_ms, 1000, "Sleep time in milliseconds " +DEFINE_int32(memory_maintenance_sleep_time_ms, 10000, "Sleep time in milliseconds " "between memory maintenance iterations"); DEFINE_int64(pause_monitor_sleep_time_ms, 500, "Sleep time in milliseconds for " @@ -89,11 +90,6 @@ DEFINE_string(local_library_dir, "/tmp", // in ways the authors of redaction rules can't anticipate. DECLARE_string(vmodule); -// tcmalloc will hold on to freed memory. We will periodically release the memory back -// to the OS if the extra memory is too high. If the memory used by the application -// is less than this fraction of the total reserved memory, free it back to the OS. -static const float TCMALLOC_RELEASE_FREE_MEMORY_FRACTION = 0.5f; - using std::string; // Log maintenance thread that runs periodically. It flushes glog every logbufsecs sec. @@ -137,36 +133,16 @@ static scoped_ptr<impala::Thread> pause_monitor; if (buffer_pool != nullptr) buffer_pool->Maintenance(); #ifndef ADDRESS_SANITIZER - // Required to ensure memory gets released back to the OS, even if tcmalloc doesn't do - // it for us. This is because tcmalloc releases memory based on the - // TCMALLOC_RELEASE_RATE property, which is not actually a rate but a divisor based - // on the number of blocks that have been deleted. When tcmalloc does decide to - // release memory, it removes a single span from the PageHeap. This means there are - // certain allocation patterns that can lead to OOM due to not enough memory being - // released by tcmalloc, even when that memory is no longer being used. - // One example is continually resizing a vector which results in many allocations. - // Even after the vector goes out of scope, all the memory will not be released - // unless there are enough other deletions that are occurring in the system. - // This can eventually lead to OOM/crashes (see IMPALA-818). - // See: http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html#runtime - size_t bytes_used = 0; - size_t bytes_in_pageheap = 0; - MallocExtension::instance()->GetNumericProperty( - "generic.current_allocated_bytes", &bytes_used); - MallocExtension::instance()->GetNumericProperty( - "generic.heap_size", &bytes_in_pageheap); - if (bytes_used < bytes_in_pageheap * TCMALLOC_RELEASE_FREE_MEMORY_FRACTION) { - MallocExtension::instance()->ReleaseFreeMemory(); - } - // When using tcmalloc, the process limit as measured by our trackers will - // be out of sync with the process usage. Update the process tracker periodically. + // be out of sync with the process usage. The metric is refreshed whenever + // memory is consumed or released via a MemTracker, so on a system with + // queries executing it will be refreshed frequently. However if the system + // is idle, we need to refresh the tracker occasionally since untracked + // memory may be allocated or freed, e.g. by background threads. if (env != NULL && env->process_mem_tracker() != NULL) { env->process_mem_tracker()->RefreshConsumptionFromMetric(); } #endif - // TODO: we should also update the process mem tracker with the reported JVM - // mem usage. } } http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/runtime/exec-env.cc ---------------------------------------------------------------------- diff --git a/be/src/runtime/exec-env.cc b/be/src/runtime/exec-env.cc index 6eeeaee..d98b9d7 100644 --- a/be/src/runtime/exec-env.cc +++ b/be/src/runtime/exec-env.cc @@ -315,6 +315,14 @@ Status ExecEnv::StartServices() { // Limit of -1 means no memory limit. mem_tracker_.reset(new MemTracker( AggregateMemoryMetric::TOTAL_USED, bytes_limit > 0 ? bytes_limit : -1, "Process")); + // Aggressive decommit is required so that unused pages in the TCMalloc page heap are + // not backed by physical pages and do not contribute towards memory consumption. + size_t aggressive_decommit_enabled = 0; + MallocExtension::instance()->GetNumericProperty( + "tcmalloc.aggressive_memory_decommit", &aggressive_decommit_enabled); + if (!aggressive_decommit_enabled) { + return Status("TCMalloc aggressive decommit is required but is disabled."); + } #else // tcmalloc metrics aren't defined in ASAN builds, just use the default behavior to // track process memory usage (sum of all children trackers). @@ -337,19 +345,6 @@ Status ExecEnv::StartServices() { mem_tracker_->AddGcFunction( [this](int64_t bytes_to_free) { disk_io_mgr_->GcIoBuffers(bytes_to_free); }); - // TODO: IMPALA-3200: register BufferPool::ReleaseMemory() as GC function. - -#ifndef ADDRESS_SANITIZER - // Since tcmalloc does not free unused memory, we may exceed the process mem limit even - // if Impala is not actually using that much memory. Add a callback to free any unused - // memory if we hit the process limit. TCMalloc GC must run last, because other GC - // functions may have released memory to TCMalloc, and TCMalloc may have cached it - // instead of releasing it to the system. - mem_tracker_->AddGcFunction([](int64_t bytes_to_free) { - MallocExtension::instance()->ReleaseToSystem(bytes_to_free); - }); -#endif - // Start services in order to ensure that dependencies between them are met if (enable_webserver_) { AddDefaultUrlCallbacks(webserver_.get(), mem_tracker_.get(), metrics_.get()); http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/runtime/mem-tracker.cc ---------------------------------------------------------------------- diff --git a/be/src/runtime/mem-tracker.cc b/be/src/runtime/mem-tracker.cc index d2dceb3..bd2f039 100644 --- a/be/src/runtime/mem-tracker.cc +++ b/be/src/runtime/mem-tracker.cc @@ -39,8 +39,6 @@ namespace impala { const string MemTracker::COUNTER_NAME = "PeakMemoryUsage"; -AtomicInt64 MemTracker::released_memory_since_gc_; - // Name for request pool MemTrackers. '$0' is replaced with the pool name. const string REQUEST_POOL_MEM_TRACKER_LABEL_FORMAT = "RequestPool=$0"; @@ -209,12 +207,6 @@ void MemTracker::RegisterMetrics(MetricGroup* metrics, const string& prefix) { limit_metric_ = metrics->AddGauge<int64_t>(Substitute("$0.limit", prefix), limit_); } -void MemTracker::RefreshConsumptionFromMetric() { - DCHECK(consumption_metric_ != NULL); - DCHECK(parent_ == NULL); - consumption_->Set(consumption_metric_->value()); -} - // Calling this on the query tracker results in output like: // // Query(4a4c81fedaed337d:4acadfda00000000) Limit=10.00 GB Total=508.28 MB Peak=508.45 MB @@ -345,7 +337,7 @@ void MemTracker::AddGcFunction(GcFunction f) { bool MemTracker::GcMemory(int64_t max_consumption) { if (max_consumption < 0) return true; lock_guard<mutex> l(gc_lock_); - if (consumption_metric_ != NULL) consumption_->Set(consumption_metric_->value()); + if (consumption_metric_ != NULL) RefreshConsumptionFromMetric(); int64_t pre_gc_consumption = consumption(); // Check if someone gc'd before us if (pre_gc_consumption < max_consumption) return false; @@ -370,14 +362,4 @@ bool MemTracker::GcMemory(int64_t max_consumption) { } return curr_consumption > max_consumption; } - -void MemTracker::GcTcmalloc() { -#ifndef ADDRESS_SANITIZER - released_memory_since_gc_.Store(0); - MallocExtension::instance()->ReleaseFreeMemory(); -#else - // Nothing to do if not using tcmalloc. -#endif -} - } http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/runtime/mem-tracker.h ---------------------------------------------------------------------- diff --git a/be/src/runtime/mem-tracker.h b/be/src/runtime/mem-tracker.h index f962fa0..2edb658 100644 --- a/be/src/runtime/mem-tracker.h +++ b/be/src/runtime/mem-tracker.h @@ -195,13 +195,8 @@ class MemTracker { return; } - if (UNLIKELY(released_memory_since_gc_.Add(bytes) > GC_RELEASE_SIZE)) { - GcTcmalloc(); - } - if (consumption_metric_ != NULL) { - DCHECK(parent_ == NULL); - consumption_->Set(consumption_metric_->value()); + RefreshConsumptionFromMetric(); return; } for (std::vector<MemTracker*>::iterator tracker = all_trackers_.begin(); @@ -258,9 +253,13 @@ class MemTracker { return result; } - /// Refresh the value of consumption_. Only valid to call if consumption_metric_ is not - /// null. - void RefreshConsumptionFromMetric(); + /// Refresh the memory consumption value from the consumption metric. Only valid to + /// call if this tracker has a consumption metric. + void RefreshConsumptionFromMetric() { + DCHECK(consumption_metric_ != nullptr); + DCHECK(parent_ == nullptr); + consumption_->Set(consumption_metric_->value()); + } int64_t limit() const { return limit_; } bool has_limit() const { return limit_ >= 0; } @@ -336,12 +335,6 @@ class MemTracker { /// gc_lock. Updates metrics if initialized. bool GcMemory(int64_t max_consumption); - /// Called when the total release memory is larger than GC_RELEASE_SIZE. - /// TcMalloc holds onto released memory and very slowly (if ever) releases it back to - /// the OS. This is problematic since it is memory we are not constantly tracking which - /// can cause us to go way over mem limits. - void GcTcmalloc(); - /// Walks the MemTracker hierarchy and populates all_trackers_ and /// limit_trackers_ void Init(); @@ -354,17 +347,6 @@ class MemTracker { static std::string LogUsage(const std::string& prefix, const std::list<MemTracker*>& trackers, int64_t* logged_consumption); - /// Size, in bytes, that is considered a large value for Release() (or Consume() with - /// a negative value). If tcmalloc is used, this can trigger it to GC. - /// A higher value will make us call into tcmalloc less often (and therefore more - /// efficient). A lower value will mean our memory overhead is lower. - /// TODO: this is a stopgap. - static const int64_t GC_RELEASE_SIZE = 128 * 1024L * 1024L; - - /// Total amount of memory from calls to Release() since the last GC. If this - /// is greater than GC_RELEASE_SIZE, this will trigger a tcmalloc gc. - static AtomicInt64 released_memory_since_gc_; - /// Lock to protect GcMemory(). This prevents many GCs from occurring at once. boost::mutex gc_lock_; http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/84110fc9/be/src/util/memory-metrics.h ---------------------------------------------------------------------- diff --git a/be/src/util/memory-metrics.h b/be/src/util/memory-metrics.h index 83de52f..2527735 100644 --- a/be/src/util/memory-metrics.h +++ b/be/src/util/memory-metrics.h @@ -40,6 +40,7 @@ class AggregateMemoryMetric { /// Approximates the total amount of physical memory consumed by the backend (i.e. not /// including JVM memory), which is either in use by queries or cached by the BufferPool /// or TcMalloc. NULL when running under ASAN. + /// TODO: IMPALA-691 - consider changing this to include JVM memory. static SumGauge<uint64_t>* TOTAL_USED; };
