(doris) branch master updated: [opt](file_cache) Add config to enable base compaction output to file cache (#44497)

dataroaring Mon, 25 Nov 2024 07:41:41 -0800

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 9867ba3f3f9 [opt](file_cache) Add config to enable base compaction 
output to file cache (#44497)
9867ba3f3f9 is described below

commit 9867ba3f3f96620f28a39123382432f8654483c9
Author: Gavin Chou <[email protected]>
AuthorDate: Mon Nov 25 23:41:07 2024 +0800

    [opt](file_cache) Add config to enable base compaction output to file cache 
(#44497)
    
    Previous implementation does not allow the output of base compaction
    write into file cache, which may have some performance penalty.
    
    This commit add a config to make that policy configurable. be.conf
    `enable_file_cache_keep_base_compaction_output` it is false by default.
    
    If your file cache is ample enough to accommodate all the data in your
    database, enable this option; otherwise, it is recommended to leave it
    disabled.
---
 be/src/common/config.cpp   |  5 +++--
 be/src/common/config.h     | 11 +++++++++--
 be/src/olap/compaction.cpp |  8 ++++++--
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/be/src/common/config.cpp b/be/src/common/config.cpp
index 2938e81a25e..0cffa30cdca 100644
--- a/be/src/common/config.cpp
+++ b/be/src/common/config.cpp
@@ -959,8 +959,6 @@ DEFINE_Int32(doris_remote_scanner_thread_pool_thread_num, 
"48");
 // number of s3 scanner thread pool queue size
 DEFINE_Int32(doris_remote_scanner_thread_pool_queue_size, "102400");
 DEFINE_mInt64(block_cache_wait_timeout_ms, "1000");
-DEFINE_mInt64(cache_lock_long_tail_threshold, "1000");
-DEFINE_Int64(file_cache_recycle_keys_size, "1000000");
 
 // limit the queue of pending batches which will be sent by a single 
nodechannel
 DEFINE_mInt64(nodechannel_pending_queue_max_bytes, "67108864");
@@ -1054,6 +1052,9 @@ DEFINE_Bool(enable_ttl_cache_evict_using_lru, "true");
 DEFINE_mBool(enbale_dump_error_file, "true");
 // limit the max size of error log on disk
 DEFINE_mInt64(file_cache_error_log_limit_bytes, "209715200"); // 200MB
+DEFINE_mInt64(cache_lock_long_tail_threshold, "1000");
+DEFINE_Int64(file_cache_recycle_keys_size, "1000000");
+DEFINE_mBool(enable_file_cache_keep_base_compaction_output, "false");
 
 DEFINE_mInt32(index_cache_entry_stay_time_after_lookup_s, "1800");
 DEFINE_mInt32(inverted_index_cache_stale_sweep_time_sec, "600");
diff --git a/be/src/common/config.h b/be/src/common/config.h
index e6247f596a1..caee1f320c1 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -1011,8 +1011,6 @@ DECLARE_mInt64(nodechannel_pending_queue_max_bytes);
 // The batch size for sending data by brpc streaming client
 DECLARE_mInt64(brpc_streaming_client_batch_bytes);
 DECLARE_mInt64(block_cache_wait_timeout_ms);
-DECLARE_mInt64(cache_lock_long_tail_threshold);
-DECLARE_Int64(file_cache_recycle_keys_size);
 
 DECLARE_Bool(enable_brpc_builtin_services);
 
@@ -1095,6 +1093,15 @@ DECLARE_Bool(enable_ttl_cache_evict_using_lru);
 DECLARE_mBool(enbale_dump_error_file);
 // limit the max size of error log on disk
 DECLARE_mInt64(file_cache_error_log_limit_bytes);
+DECLARE_mInt64(cache_lock_long_tail_threshold);
+DECLARE_Int64(file_cache_recycle_keys_size);
+// Base compaction may retrieve and produce some less frequently accessed data,
+// potentially affecting the file cache hit rate.
+// This configuration determines whether to retain the output within the file 
cache.
+// Make your choice based on the following considerations:
+// If your file cache is ample enough to accommodate all the data in your 
database,
+// enable this option; otherwise, it is recommended to leave it disabled.
+DECLARE_mBool(enable_file_cache_keep_base_compaction_output);
 
 // inverted index searcher cache
 // cache entry stay time after lookup
diff --git a/be/src/olap/compaction.cpp b/be/src/olap/compaction.cpp
index d7073491320..68ed0322a9e 100644
--- a/be/src/olap/compaction.cpp
+++ b/be/src/olap/compaction.cpp
@@ -1190,8 +1190,12 @@ Status 
CloudCompactionMixin::construct_output_rowset_writer(RowsetWriterContext&
         ctx.compaction_level = 
_engine.cumu_compaction_policy(compaction_policy)
                                        ->new_compaction_level(_input_rowsets);
     }
-
-    ctx.write_file_cache = compaction_type() == 
ReaderType::READER_CUMULATIVE_COMPACTION;
+    // We presume that the data involved in cumulative compaction is 
sufficiently 'hot'
+    // and should always be retained in the cache.
+    // TODO(gavin): Ensure that the retention of hot data is implemented with 
precision.
+    ctx.write_file_cache = (compaction_type() == 
ReaderType::READER_CUMULATIVE_COMPACTION) ||
+                           
(config::enable_file_cache_keep_base_compaction_output &&
+                            compaction_type() == 
ReaderType::READER_BASE_COMPACTION);
     ctx.file_cache_ttl_sec = _tablet->ttl_seconds();
     _output_rs_writer = DORIS_TRY(_tablet->create_rowset_writer(ctx, 
_is_vertical));
     
RETURN_IF_ERROR(_engine.meta_mgr().prepare_rowset(*_output_rs_writer->rowset_meta().get()));


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris) branch master updated: [opt](file_cache) Add config to enable base compaction output to file cache (#44497)

Reply via email to