Re: [PR] [feature](be) Implement adaptive batch size with EWMA prediction for byte-budget control [doris]

via GitHub Tue, 07 Apr 2026 04:37:51 -0700


Copilot commented on code in PR #61535:
URL: https://github.com/apache/doris/pull/61535#discussion_r3044497622



##########
be/src/runtime/runtime_state.h:
##########
@@ -138,7 +138,42 @@ class RuntimeState {
 
     const DescriptorTbl& desc_tbl() const { return *_desc_tbl; }
     void set_desc_tbl(const DescriptorTbl* desc_tbl) { _desc_tbl = desc_tbl; }
+
+    MOCK_FUNCTION int block_max_rows() const {
+        if (config::enable_adaptive_batch_size && preferred_block_size_bytes() 
> 0) {
+            auto rows = preferred_block_size_rows();
+            return static_cast<int>(std::min(rows, 
static_cast<size_t>(INT32_MAX)));
+        }
+        return _query_options.batch_size;
+    }
+
+    MOCK_FUNCTION size_t block_max_bytes() const {
+        return config::enable_adaptive_batch_size ? 
preferred_block_size_bytes() : 0;
+    }
+
     MOCK_FUNCTION int batch_size() const { return _query_options.batch_size; }
+    MOCK_FUNCTION size_t preferred_block_size_bytes() const {
+        if (_query_options.__isset.preferred_block_size_bytes) {
+            auto v = _query_options.preferred_block_size_bytes;
+            return v > 0 ? static_cast<size_t>(v) : 0;
+        }
+        return 0;
+    }

Review Comment:
   RuntimeState::preferred_block_size_bytes() returns 0 when the query option 
is not explicitly set (__isset is false). This makes adaptive batch sizing 
effectively disabled by default, which is inconsistent with the stated default 
of 8MB (and also inconsistent with 
preferred_block_size_rows()/preferred_max_column_in_block_size_bytes(), which 
do return defaults when unset). Consider returning 8MB when __isset is false 
(similar to spill_buffer_size_bytes()), so the BE-side default behavior matches 
the documented defaults even if an older FE doesn’t forward the new field.



##########
regression-test/suites/query_p0/adaptive_batch_size/adaptive_batch_size.groovy:
##########
@@ -0,0 +1,212 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Regression tests for the Adaptive Batch Size feature.
+//
+// Design notes:
+//   - Each case runs the same query with the feature enabled and disabled, and
+//     asserts that results are identical (correctness check).
+//   - We do NOT directly assert internal block byte sizes, because the storage
+//     layer does not expose them via SQL result columns.  Correctness is the
+//     primary requirement; performance / memory reduction is verified manually
+//     or via profile counters in a separate benchmark.
+
+suite("adaptive_batch_size") {
+
+    // ── helpers 
────────────────────────────────────────────────────────────────
+
+    def set_adaptive = { enabled ->
+        // preferred_block_size_bytes controls the feature end-to-end.
+        // Setting it to a very small value (e.g. 1 byte) forces the predictor 
to
+        // return row count = 1 every time, which is a good stress test.
+        if (enabled) {
+            sql "set preferred_block_size_bytes = 8388608"     // 8 MB 
(default)
+            sql "set batch_size = 65535"
+            sql "set preferred_max_column_in_block_size_bytes = 1048576" // 1 
MB
+        } else {
+            // Setting to 0 disables the byte-limit path at the accumulation 
layer.
+            sql "set preferred_block_size_bytes = 0"
+            sql "set batch_size = 4096"
+            sql "set preferred_max_column_in_block_size_bytes = 0"
+        }
+    }
+
+    // ── Test 1: wide table (VARCHAR columns) 
──────────────────────────────────
+    // Each row is ~10 KB; with 4096 rows that is ~40 MB/batch which OOM-risks.
+    // With adaptive=on the batch is trimmed to ~8 MB worth of rows.
+
+    sql "drop table if exists abs_wide_table"
+    sql """
+        create table abs_wide_table (
+            id      int         not null,
+            c1      varchar(4096),
+            c2      varchar(4096),
+            c3      varchar(4096)
+        )
+        ENGINE=OLAP
+        DUPLICATE KEY(id)
+        DISTRIBUTED BY HASH(id) BUCKETS 1
+        PROPERTIES ("replication_allocation" = "tag.location.default: 1")
+    """
+
+    // Insert 1000 rows with ~3 KB data each.
+    def wide_rows = (1..1000).collect { i ->
+        "(${i}, '${('a' * 1000)}', '${('b' * 1000)}', '${('c' * 1000)}')"
+    }
+    sql "insert into abs_wide_table values ${wide_rows.join(',')}"
+

Review Comment:
   The wide-table data load builds a single INSERT statement with 1000 rows of 
~3KB string literals (wide_rows.join(',')), which can produce a multi‑MB SQL 
string. This risks exceeding max query length / max_packet limits in some CI 
environments and can make the regression suite flaky/slow. Prefer generating 
the data on the BE side (e.g., INSERT…SELECT from numbers() with 
repeat()/lpad(), or inserting in smaller batches) to keep the statement size 
bounded.



##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -913,6 +1006,24 @@ Status 
VCollectIterator::Level1Iterator::_merge_next(Block* block) {
             continuous_row_in_block = 0;
             pre_row_ref = cur_row;
         }
+
+        // Byte-budget check: _merge_next() has already advanced _ref to the 
next unread row,
+        // so it is safe to stop here without duplicating any data.
+        if (collected_enough_rows(target_columns, continuous_row_in_block)) {
+            if (continuous_row_in_block > 0) {

Review Comment:
   In Level1Iterator::_merge_next(), collected_enough_rows() is evaluated on 
every loop iteration. That call recomputes Block::columns_byte_size(columns) by 
summing byte_size() across all columns each time, which makes the merge loop 
O(rows * columns) and can be a noticeable CPU regression on wide schemas. 
Consider checking the byte budget only every N rows (similar to BlockReader’s 
interval checks), or maintaining an incremental byte counter updated when 
ranges are inserted, so you avoid rescanning all columns per row.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [feature](be) Implement adaptive batch size with EWMA prediction for byte-budget control [doris]

Reply via email to