This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 408f4a48e61 [config](load) improve in-memory aggregation triggering
for AGG KEY tables (#52305)
408f4a48e61 is described below
commit 408f4a48e6196a3bf1892bfc601a1c953da3d6d0
Author: Kaijie Chen <[email protected]>
AuthorDate: Fri Aug 15 21:15:48 2025 +0800
[config](load) improve in-memory aggregation triggering for AGG KEY tables
(#52305)
### What problem does this PR solve?
This commit makes two related changes to enable more efficient and
timely
in-memory aggregation during data loading:
1. Reduce the default value of `write_buffer_size_for_agg` from 400MB to
100MB.
The previous default exceeded the `write_buffer_size` (200MB), causing
`need_agg()` to never trigger before `need_flush()`, effectively
skipping
the intended aggregation step.
2. Refine the `need_agg()` logic to be based on memory growth *since the
last
aggregation*, rather than total memory usage. This is tracked using a
new
`_last_agg_pos` field, which is updated after each aggregation. This
prevents
repeated aggregation when memory usage remains stagnant and allows for
more adaptive and efficient memory management.
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
---------
Co-authored-by: Yongqiang YANG <[email protected]>
---
be/src/common/config.cpp | 3 +--
be/src/olap/memtable.cpp | 3 ++-
be/src/olap/memtable.h | 1 +
3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/be/src/common/config.cpp b/be/src/common/config.cpp
index 50c8ca70193..5ab1d344b44 100644
--- a/be/src/common/config.cpp
+++ b/be/src/common/config.cpp
@@ -685,8 +685,7 @@ DEFINE_mInt32(memory_gc_sleep_time_ms, "500");
// max write buffer size before flush, default 200MB
DEFINE_mInt64(write_buffer_size, "209715200");
// max buffer size used in memtable for the aggregated table, default 400MB
-DEFINE_mInt64(write_buffer_size_for_agg, "419430400");
-
+DEFINE_mInt64(write_buffer_size_for_agg, "104857600");
DEFINE_mInt64(min_write_buffer_size_for_partial_update, "1048576");
// max parallel flush task per memtable writer
DEFINE_mInt32(memtable_flush_running_count_limit, "2");
diff --git a/be/src/olap/memtable.cpp b/be/src/olap/memtable.cpp
index e8d2904c2bb..1dbe2e8020b 100644
--- a/be/src/olap/memtable.cpp
+++ b/be/src/olap/memtable.cpp
@@ -647,6 +647,7 @@ void MemTable::shrink_memtable_by_agg() {
if (same_keys_num != 0) {
(_skip_bitmap_col_idx == -1) ? _aggregate<false, false>() :
_aggregate<false, true>();
}
+ _last_agg_pos = memory_usage();
}
bool MemTable::need_flush() const {
@@ -663,7 +664,7 @@ bool MemTable::need_flush() const {
bool MemTable::need_agg() const {
if (_keys_type == KeysType::AGG_KEYS) {
- auto max_size = config::write_buffer_size_for_agg;
+ auto max_size = _last_agg_pos + config::write_buffer_size_for_agg;
return memory_usage() >= max_size;
}
return false;
diff --git a/be/src/olap/memtable.h b/be/src/olap/memtable.h
index f582c26c029..5cd70e812a9 100644
--- a/be/src/olap/memtable.h
+++ b/be/src/olap/memtable.h
@@ -245,6 +245,7 @@ private:
vectorized::MutableBlock _input_mutable_block;
vectorized::MutableBlock _output_mutable_block;
size_t _last_sorted_pos = 0;
+ size_t _last_agg_pos = 0;
//return number of same keys
size_t _sort();
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]