[incubator-doris] branch master updated: [Config] Add new BE config for tcmalloc (#3732)

morningman Wed, 03 Jun 2020 06:59:01 -0700

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 2ad1b20  [Config] Add new BE config for tcmalloc (#3732)
2ad1b20 is described below

commit 2ad1b20b243330827093569c9c5d97c59a42056f
Author: Mingyu Chen <[email protected]>
AuthorDate: Wed Jun 3 21:58:13 2020 +0800

    [Config] Add new BE config for tcmalloc (#3732)
    
    Add a new BE config tc_max_total_thread_cache_bytes
---
 be/src/common/config.h                             |  10 +
 be/src/service/doris_main.cpp                      |  11 +-
 docs/en/administrator-guide/config/be_config.md    | 373 +++++++++++----------
 docs/zh-CN/administrator-guide/config/be_config.md |   8 +
 4 files changed, 218 insertions(+), 184 deletions(-)

diff --git a/be/src/common/config.h b/be/src/common/config.h
index b93b72c..ae2c838 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -44,6 +44,16 @@ namespace config {
     // free memory rate.[0-100]
     CONF_mInt64(tc_free_memory_rate, "20");
 
+    // Bound on the total amount of bytes allocated to thread caches.
+    // This bound is not strict, so it is possible for the cache to go over 
this bound
+    // in certain circumstances. This value defaults to 1GB
+    // If you suspect your application is not scaling to many threads due to 
lock contention in TCMalloc,
+    // you can try increasing this value. This may improve performance, at a 
cost of extra memory
+    // use by TCMalloc.
+    // reference: https://gperftools.github.io/gperftools/tcmalloc.html: 
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
+    //            https://github.com/gperftools/gperftools/issues/1111
+    CONF_Int64(tc_max_total_thread_cache_bytes, "1073741824");
+
     // process memory limit specified as number of bytes
     // ('<int>[bB]?'), megabytes ('<float>[mM]'), gigabytes ('<float>[gG]'),
     // or percentage of the physical memory ('<int>%').
diff --git a/be/src/service/doris_main.cpp b/be/src/service/doris_main.cpp
index 6a2e778..7f8a238 100644
--- a/be/src/service/doris_main.cpp
+++ b/be/src/service/doris_main.cpp
@@ -126,8 +126,15 @@ int main(int argc, char** argv) {
     }
 
 #if !defined(ADDRESS_SANITIZER) && !defined(LEAK_SANITIZER) && 
!defined(THREAD_SANITIZER)
-    
MallocExtension::instance()->SetNumericProperty("tcmalloc.aggressive_memory_decommit",
-                                                    21474836480);
+    // Aggressive decommit is required so that unused pages in the TCMalloc 
page heap are
+    // not backed by physical pages and do not contribute towards memory 
consumption.
+    
MallocExtension::instance()->SetNumericProperty("tcmalloc.aggressive_memory_decommit",
 1);
+    // Change the total TCMalloc thread cache size if necessary.
+    if (!MallocExtension::instance()->SetNumericProperty(
+                "tcmalloc.max_total_thread_cache_bytes", 
doris::config::tc_max_total_thread_cache_bytes)) {
+        fprintf(stderr, "Failed to change TCMalloc total thread cache 
size.\n");
+        return -1;
+    }
 #endif
 
     std::vector<doris::StorePath> paths;
diff --git a/docs/en/administrator-guide/config/be_config.md 
b/docs/en/administrator-guide/config/be_config.md
index da4b3bd..36d45ad 100644
--- a/docs/en/administrator-guide/config/be_config.md
+++ b/docs/en/administrator-guide/config/be_config.md
@@ -41,25 +41,25 @@ This document mainly introduces the relevant configuration 
items of BE.
 
 ## Configurations
 
-### alter_tablet_worker_count
+### `alter_tablet_worker_count`
 
-### base_compaction_check_interval_seconds
+### `base_compaction_check_interval_seconds`
 
-### base_compaction_interval_seconds_since_last_operation
+### `base_compaction_interval_seconds_since_last_operation`
 
-### base_compaction_num_cumulative_deltas
+### `base_compaction_num_cumulative_deltas`
 
-### base_compaction_num_threads_per_disk
+### `base_compaction_num_threads_per_disk`
 
-### base_compaction_write_mbytes_per_sec
+### `base_compaction_write_mbytes_per_sec`
 
-### base_cumulative_delta_ratio
+### `base_cumulative_delta_ratio`
 
-### be_port
+### `be_port`
 
-### be_service_threads
+### `be_service_threads`
 
-### brpc_max_body_size
+### `brpc_max_body_size`
 
 This configuration is mainly used to modify the parameter `max_body_size` of 
brpc.
 
@@ -77,340 +77,349 @@ Sometimes the query fails and an error message of `The 
server is overcrowded` wi
 
 Since this is a brpc configuration, users can also modify this parameter 
directly during operation. Modify by visiting `http://be_host:brpc_port/flags`.
 
-### brpc_port
+### `brpc_port`
 
-### buffer_pool_clean_pages_limit
+### `buffer_pool_clean_pages_limit`
 
-### buffer_pool_limit
+### `buffer_pool_limit`
 
-### check_consistency_worker_count
+### `check_consistency_worker_count`
 
-### chunk_reserved_bytes_limit
+### `chunk_reserved_bytes_limit`
 
-### clear_transaction_task_worker_count
+### `clear_transaction_task_worker_count`
 
-### clone_worker_count
+### `clone_worker_count`
 
-### cluster_id
+### `cluster_id`
 
-### column_dictionary_key_ratio_threshold
+### `column_dictionary_key_ratio_threshold`
 
-### column_dictionary_key_size_threshold
+### `column_dictionary_key_size_threshold`
 
-### compress_rowbatches
+### `compress_rowbatches`
 
-### create_tablet_worker_count
+### `create_tablet_worker_count`
 
-### cumulative_compaction_budgeted_bytes
+### `cumulative_compaction_budgeted_bytes`
 
-### cumulative_compaction_check_interval_seconds
+### `cumulative_compaction_check_interval_seconds`
 
-### cumulative_compaction_num_threads_per_disk
+### `cumulative_compaction_num_threads_per_disk`
 
-### cumulative_compaction_skip_window_seconds
+### `cumulative_compaction_skip_window_seconds`
 
-### default_num_rows_per_column_file_block
+### `default_num_rows_per_column_file_block`
 
-### default_query_options
+### `default_query_options`
 
-### default_rowset_type
+### `default_rowset_type`
 
-### delete_worker_count
+### `delete_worker_count`
 
-### disable_mem_pools
+### `disable_mem_pools`
 
-### disable_storage_page_cache
+### `disable_storage_page_cache`
 
-### disk_stat_monitor_interval
+### `disk_stat_monitor_interval`
 
-### doris_cgroups
+### `doris_cgroups`
 
-### doris_max_pushdown_conjuncts_return_rate
+### `doris_max_pushdown_conjuncts_return_rate`
 
-### doris_max_scan_key_num
+### `doris_max_scan_key_num`
 
-### doris_scan_range_row_count
+### `doris_scan_range_row_count`
 
-### doris_scanner_queue_size
+### `doris_scanner_queue_size`
 
-### doris_scanner_row_num
+### `doris_scanner_row_num`
 
-### doris_scanner_thread_pool_queue_size
+### `doris_scanner_thread_pool_queue_size`
 
-### doris_scanner_thread_pool_thread_num
+### `doris_scanner_thread_pool_thread_num`
 
-### download_low_speed_limit_kbps
+### `download_low_speed_limit_kbps`
 
-### download_low_speed_time
+### `download_low_speed_time`
 
-### download_worker_count
+### `download_worker_count`
 
-### drop_tablet_worker_count
+### `drop_tablet_worker_count`
 
-### enable_metric_calculator
+### `enable_metric_calculator`
 
-### enable_partitioned_aggregation
+### `enable_partitioned_aggregation`
 
-### enable_prefetch
+### `enable_prefetch`
 
-### enable_quadratic_probing
+### `enable_quadratic_probing`
 
-### enable_system_metrics
+### `enable_system_metrics`
 
-### enable_token_check
+### `enable_token_check`
 
-### es_http_timeout_ms
+### `es_http_timeout_ms`
 
-### es_scroll_keepalive
+### `es_scroll_keepalive`
 
-### etl_thread_pool_queue_size
+### `etl_thread_pool_queue_size`
 
-### etl_thread_pool_size
+### `etl_thread_pool_size`
 
-### exchg_node_buffer_size_bytes
+### `exchg_node_buffer_size_bytes`
 
-### file_descriptor_cache_capacity
+### `file_descriptor_cache_capacity`
 
-### file_descriptor_cache_clean_interval
+### `file_descriptor_cache_clean_interval`
 
-### flush_thread_num_per_store
+### `flush_thread_num_per_store`
 
-### force_recovery
+### `force_recovery`
 
-### fragment_pool_queue_size
+### `fragment_pool_queue_size`
 
-### fragment_pool_thread_num
+### `fragment_pool_thread_num`
 
-### heartbeat_service_port
+### `heartbeat_service_port`
 
-### heartbeat_service_thread_count
+### `heartbeat_service_thread_count`
 
-### ignore_broken_disk
+### `ignore_broken_disk`
 
-### inc_rowset_expired_sec
+### `ignore_load_tablet_failure`
 
-### index_stream_cache_capacity
+* Type: boolean
+* Description: Whether to continue to start be when load tablet from header 
failed.
+* Default: false
 
-### load_data_reserve_hours
+When the BE starts, it will start a separate thread for each data directory to 
load the tablet header meta information. In the default configuration, if a 
tablet fails to load its header, the startup process is terminated. At the same 
time, you will see the following error message in the `be.INFO`:
 
-### load_error_log_reserve_hours
+```
+load tablets from header failed, failed tablets size: xxx, path=xxx
+```
 
-### load_process_max_memory_limit_bytes
+Indicates how many tablets in this data directory failed to load. At the same 
time, the log will also have specific information about the tablet that failed 
to load. In this case, manual intervention is required to troubleshoot the 
cause of the error. After troubleshooting, there are usually two ways to 
recover:
 
-### load_process_max_memory_limit_percent
+1. If the tablet information is not repairable, you can delete the wrong 
tablet through the `meta_tool` tool under the condition that other copies are 
normal.
+2. Set `ignore_load_tablet_failure` to true, BE will ignore these wrong 
tablets and start normally.
 
-### local_library_dir
+### `inc_rowset_expired_sec`
 
-### log_buffer_level
+### `index_stream_cache_capacity`
 
-### madvise_huge_pages
+### `load_data_reserve_hours`
 
-### make_snapshot_worker_count
+### `load_error_log_reserve_hours`
 
-### max_client_cache_size_per_host
+### `load_process_max_memory_limit_bytes`
 
-### max_compaction_concurrency
+### `load_process_max_memory_limit_percent`
 
-### max_consumer_num_per_group
+### `local_library_dir`
 
-### max_cumulative_compaction_num_singleton_deltas
+### `log_buffer_level`
 
-### max_download_speed_kbps
+### `madvise_huge_pages`
 
-### max_free_io_buffers
+### `make_snapshot_worker_count`
 
-### max_garbage_sweep_interval
+### `max_client_cache_size_per_host`
 
-### max_memory_sink_batch_count
+### `max_compaction_concurrency`
 
-### max_percentage_of_error_disk
+### `max_consumer_num_per_group`
 
-### max_runnings_transactions_per_txn_map
+### `max_cumulative_compaction_num_singleton_deltas`
 
-### max_tablet_num_per_shard
+### `max_download_speed_kbps`
 
-### mem_limit
+### `max_free_io_buffers`
 
-### memory_limitation_per_thread_for_schema_change
+### `max_garbage_sweep_interval`
 
-### memory_maintenance_sleep_time_s
+### `max_memory_sink_batch_count`
 
-### memory_max_alignment
+### `max_percentage_of_error_disk`
 
-### min_buffer_size
+### `max_runnings_transactions_per_txn_map`
 
-### min_compaction_failure_interval_sec
+### `max_tablet_num_per_shard`
 
-### min_cumulative_compaction_num_singleton_deltas
+### `mem_limit`
 
-### min_file_descriptor_number
+### `memory_limitation_per_thread_for_schema_change`
 
-### min_garbage_sweep_interval
+### `memory_maintenance_sleep_time_s`
 
-### mmap_buffers
+### `memory_max_alignment`
 
-### num_cores
+### `min_buffer_size`
 
-### num_disks
+### `min_compaction_failure_interval_sec`
 
-### num_threads_per_core
+### `min_cumulative_compaction_num_singleton_deltas`
 
-### num_threads_per_disk
+### `min_file_descriptor_number`
 
-### number_tablet_writer_threads
+### `min_garbage_sweep_interval`
 
-### path_gc_check
+### `mmap_buffers`
 
-### path_gc_check_interval_second
+### `num_cores`
 
-### path_gc_check_step
+### `num_disks`
 
-### path_gc_check_step_interval_ms
+### `num_threads_per_core`
 
-### path_scan_interval_second
+### `num_threads_per_disk`
 
-### pending_data_expire_time_sec
+### `number_tablet_writer_threads`
 
-### periodic_counter_update_period_ms
+### `path_gc_check`
 
-### plugin_path
+### `path_gc_check_interval_second`
 
-### port
+### `path_gc_check_step`
 
-### pprof_profile_dir
+### `path_gc_check_step_interval_ms`
 
-### priority_networks
+### `path_scan_interval_second`
 
-### priority_queue_remaining_tasks_increased_frequency
+### `pending_data_expire_time_sec`
 
-### publish_version_worker_count
+### `periodic_counter_update_period_ms`
 
-### pull_load_task_dir
+### `plugin_path`
 
-### push_worker_count_high_priority
+### `port`
 
-### push_worker_count_normal_priority
+### `pprof_profile_dir`
 
-### push_write_mbytes_per_sec
+### `priority_networks`
 
-### query_scratch_dirs
+### `priority_queue_remaining_tasks_increased_frequency`
 
-### read_size
+### `publish_version_worker_count`
 
-### release_snapshot_worker_count
+### `pull_load_task_dir`
 
-### report_disk_state_interval_seconds
+### `push_worker_count_high_priority`
 
-### report_tablet_interval_seconds
+### `push_worker_count_normal_priority`
 
-### report_task_interval_seconds
+### `push_write_mbytes_per_sec`
 
-### result_buffer_cancelled_interval_time
+### `query_scratch_dirs`
 
-### routine_load_thread_pool_size
+### `read_size`
 
-### row_nums_check
+### `release_snapshot_worker_count`
 
-### scan_context_gc_interval_min
+### `report_disk_state_interval_seconds`
 
-### scratch_dirs
+### `report_tablet_interval_seconds`
 
-### serialize_batch
+### `report_task_interval_seconds`
 
-### sleep_five_seconds
+### `result_buffer_cancelled_interval_time`
 
-### sleep_one_second
+### `routine_load_thread_pool_size`
 
-### small_file_dir
+### `row_nums_check`
 
-### snapshot_expire_time_sec
+### `scan_context_gc_interval_min`
 
-### sorter_block_size
+### `scratch_dirs`
 
-### status_report_interval
+### `serialize_batch`
 
-### storage_flood_stage_left_capacity_bytes
+### `sleep_five_seconds`
 
-### storage_flood_stage_usage_percent
+### `sleep_one_second`
 
-### storage_medium_migrate_count
+### `small_file_dir`
 
-### storage_page_cache_limit
+### `snapshot_expire_time_sec`
 
-### storage_root_path
+### `sorter_block_size`
 
-### streaming_load_max_mb
+### `status_report_interval`
 
-### streaming_load_rpc_max_alive_time_sec
+### `storage_flood_stage_left_capacity_bytes`
 
-### sync_tablet_meta
+### `storage_flood_stage_usage_percent`
 
-### sys_log_dir
+### `storage_medium_migrate_count`
 
-### sys_log_level
+### `storage_page_cache_limit`
 
-### sys_log_roll_mode
+### `storage_root_path`
 
-### sys_log_roll_num
+### `streaming_load_max_mb`
 
-### sys_log_verbose_level
+### `streaming_load_rpc_max_alive_time_sec`
 
-### sys_log_verbose_modules
+### `sync_tablet_meta`
 
-### tablet_map_shard_size
+### `sys_log_dir`
 
-### tablet_meta_checkpoint_min_interval_secs
+### `sys_log_level`
 
-### tablet_meta_checkpoint_min_new_rowsets_num
+### `sys_log_roll_mode`
 
-### tablet_stat_cache_update_interval_second
+### `sys_log_roll_num`
 
-### tablet_writer_open_rpc_timeout_sec
+### `sys_log_verbose_level`
 
-### tc_free_memory_rate
+### `sys_log_verbose_modules`
 
-### tc_use_memory_min
+### `tablet_map_shard_size`
 
-### thrift_connect_timeout_seconds
+### `tablet_meta_checkpoint_min_interval_secs`
 
-### thrift_rpc_timeout_ms
+### `tablet_meta_checkpoint_min_new_rowsets_num`
 
-### trash_file_expire_time_sec
+### `tablet_stat_cache_update_interval_second`
 
-### txn_commit_rpc_timeout_ms
+### `tablet_writer_open_rpc_timeout_sec`
 
-### txn_map_shard_size
+### `tc_free_memory_rate`
 
-### txn_shard_size
+### `tc_max_total_thread_cache_bytes`
 
-### unused_rowset_monitor_interval
+* Type: int64
+* Description: Used to limit the total thread cache size in tcmalloc. This 
limit is not a hard limit, so the actual thread cache usage may exceed this 
limit. For details, please refer to 
[TCMALLOC\_MAX\_TOTAL\_THREAD\_CACHE\_BYTES](https://gperftools.github.io/gperftools/tcmalloc.html)
+* Default: 1073741824
 
-### upload_worker_count
+If the system is found to be in a high-stress scenario and a large number of 
threads are found in the tcmalloc lock competition phase through the BE thread 
stack, such as a large number of `SpinLock` related stacks, you can try 
increasing this parameter to improve system performance. [Reference] 
(https://github.com/gperftools/gperftools/issues/1111)
 
-### use_mmap_allocate_chunk
+### `tc_use_memory_min`
 
-### user_function_dir
+### `thrift_connect_timeout_seconds`
 
-### web_log_bytes
+### `thrift_rpc_timeout_ms`
 
-### webserver_num_workers
+### `trash_file_expire_time_sec`
 
-### webserver_port
+### `txn_commit_rpc_timeout_ms`
 
-### write_buffer_size
+### `txn_map_shard_size`
 
-### ignore_load_tablet_failure
-* Type: boolean
-* Description: Whether to continue to start be when load tablet from header 
failed.
-* Default: false
+### `txn_shard_size`
 
-When the BE starts, it will start a separate thread for each data directory to 
load the tablet header meta information. In the default configuration, if a 
tablet fails to load its header, the startup process is terminated. At the same 
time, you will see the following error message in the `be.INFO`:
+### `unused_rowset_monitor_interval`
 
-```
-load tablets from header failed, failed tablets size: xxx, path=xxx
-```
+### `upload_worker_count`
 
-Indicates how many tablets in this data directory failed to load. At the same 
time, the log will also have specific information about the tablet that failed 
to load. In this case, manual intervention is required to troubleshoot the 
cause of the error. After troubleshooting, there are usually two ways to 
recover:
+### `use_mmap_allocate_chunk`
 
-1. If the tablet information is not repairable, you can delete the wrong 
tablet through the `meta_tool` tool under the condition that other copies are 
normal.
-2. Set `ignore_load_tablet_failure` to true, BE will ignore these wrong 
tablets and start normally.
+### `user_function_dir`
+
+### `web_log_bytes`
+
+### `webserver_num_workers`
+
+### `webserver_port`
+
+### `write_buffer_size`
\ No newline at end of file
diff --git a/docs/zh-CN/administrator-guide/config/be_config.md 
b/docs/zh-CN/administrator-guide/config/be_config.md
index f461335..63d3606 100644
--- a/docs/zh-CN/administrator-guide/config/be_config.md
+++ b/docs/zh-CN/administrator-guide/config/be_config.md
@@ -367,6 +367,14 @@ under the License.
 
 ### `tc_free_memory_rate`
 
+### `tc_max_total_thread_cache_bytes`
+
+* 类型：int64
+* 描述：用来限制 tcmalloc 中总的线程缓存大小。这个限制不是硬限，因此实际线程缓存使用可能超过这个限制。具体可参阅 
[TCMALLOC\_MAX\_TOTAL\_THREAD\_CACHE\_BYTES](https://gperftools.github.io/gperftools/tcmalloc.html)
+* 默认值： 1073741824
+
+如果发现系统在高压力场景下，通过 BE 线程堆栈发现大量线程处于 tcmalloc 的锁竞争阶段，如大量的 `SpinLock` 
相关堆栈，则可以尝试增大该参数来提升系统性能。[参考](https://github.com/gperftools/gperftools/issues/1111)
+
 ### `tc_use_memory_min`
 
 ### `thrift_connect_timeout_seconds`


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[incubator-doris] branch master updated: [Config] Add new BE config for tcmalloc (#3732)

Reply via email to