This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 408f4a48e61 [config](load) improve in-memory aggregation triggering 
for AGG KEY tables (#52305)
408f4a48e61 is described below

commit 408f4a48e6196a3bf1892bfc601a1c953da3d6d0
Author: Kaijie Chen <[email protected]>
AuthorDate: Fri Aug 15 21:15:48 2025 +0800

    [config](load) improve in-memory aggregation triggering for AGG KEY tables 
(#52305)
    
    ### What problem does this PR solve?
    
    This commit makes two related changes to enable more efficient and
    timely
    in-memory aggregation during data loading:
    
    1. Reduce the default value of `write_buffer_size_for_agg` from 400MB to
    100MB.
    The previous default exceeded the `write_buffer_size` (200MB), causing
    `need_agg()` to never trigger before `need_flush()`, effectively
    skipping
       the intended aggregation step.
    
    2. Refine the `need_agg()` logic to be based on memory growth *since the
    last
    aggregation*, rather than total memory usage. This is tracked using a
    new
    `_last_agg_pos` field, which is updated after each aggregation. This
    prevents
    repeated aggregation when memory usage remains stagnant and allows for
       more adaptive and efficient memory management.
    
    ### Release note
    
    None
    
    ### Check List (For Author)
    
    - Test <!-- At least one of them must be included. -->
        - [ ] Regression test
        - [ ] Unit Test
        - [ ] Manual test (add detailed scripts or steps below)
        - [ ] No need to test or manual test. Explain why:
    - [ ] This is a refactor/code format and no logic has been changed.
            - [ ] Previous test can cover this change.
            - [ ] No code files have been changed.
            - [ ] Other reason <!-- Add your reason?  -->
    
    - Behavior changed:
        - [ ] No.
        - [ ] Yes. <!-- Explain the behavior change -->
    
    - Does this need documentation?
        - [ ] No.
    - [ ] Yes. <!-- Add document PR link here. eg:
    https://github.com/apache/doris-website/pull/1214 -->
    
    ### Check List (For Reviewer who merge this PR)
    
    - [ ] Confirm the release note
    - [ ] Confirm test cases
    - [ ] Confirm document
    - [ ] Add branch pick label <!-- Add branch pick label that this PR
    should merge into -->
    
    ---------
    
    Co-authored-by: Yongqiang YANG <[email protected]>
---
 be/src/common/config.cpp | 3 +--
 be/src/olap/memtable.cpp | 3 ++-
 be/src/olap/memtable.h   | 1 +
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/be/src/common/config.cpp b/be/src/common/config.cpp
index 50c8ca70193..5ab1d344b44 100644
--- a/be/src/common/config.cpp
+++ b/be/src/common/config.cpp
@@ -685,8 +685,7 @@ DEFINE_mInt32(memory_gc_sleep_time_ms, "500");
 // max write buffer size before flush, default 200MB
 DEFINE_mInt64(write_buffer_size, "209715200");
 // max buffer size used in memtable for the aggregated table, default 400MB
-DEFINE_mInt64(write_buffer_size_for_agg, "419430400");
-
+DEFINE_mInt64(write_buffer_size_for_agg, "104857600");
 DEFINE_mInt64(min_write_buffer_size_for_partial_update, "1048576");
 // max parallel flush task per memtable writer
 DEFINE_mInt32(memtable_flush_running_count_limit, "2");
diff --git a/be/src/olap/memtable.cpp b/be/src/olap/memtable.cpp
index e8d2904c2bb..1dbe2e8020b 100644
--- a/be/src/olap/memtable.cpp
+++ b/be/src/olap/memtable.cpp
@@ -647,6 +647,7 @@ void MemTable::shrink_memtable_by_agg() {
     if (same_keys_num != 0) {
         (_skip_bitmap_col_idx == -1) ? _aggregate<false, false>() : 
_aggregate<false, true>();
     }
+    _last_agg_pos = memory_usage();
 }
 
 bool MemTable::need_flush() const {
@@ -663,7 +664,7 @@ bool MemTable::need_flush() const {
 
 bool MemTable::need_agg() const {
     if (_keys_type == KeysType::AGG_KEYS) {
-        auto max_size = config::write_buffer_size_for_agg;
+        auto max_size = _last_agg_pos + config::write_buffer_size_for_agg;
         return memory_usage() >= max_size;
     }
     return false;
diff --git a/be/src/olap/memtable.h b/be/src/olap/memtable.h
index f582c26c029..5cd70e812a9 100644
--- a/be/src/olap/memtable.h
+++ b/be/src/olap/memtable.h
@@ -245,6 +245,7 @@ private:
     vectorized::MutableBlock _input_mutable_block;
     vectorized::MutableBlock _output_mutable_block;
     size_t _last_sorted_pos = 0;
+    size_t _last_agg_pos = 0;
 
     //return number of same keys
     size_t _sort();


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to