Re: [PR] [Improvement](streaming-agg) adjust single streaming threshold [doris]

via GitHub Thu, 05 Mar 2026 23:05:18 -0800


github-actions[bot] commented on code in PR #61104:
URL: https://github.com/apache/doris/pull/61104#discussion_r2894195725



##########
be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp:
##########
@@ -53,6 +53,17 @@ static constexpr StreamingHtMinReductionEntry 
STREAMING_HT_MIN_REDUCTION[] = {
         {.min_ht_mem = 16 * 1024 * 1024, .streaming_ht_min_reduction = 2.0},
 };
 
+static constexpr StreamingHtMinReductionEntry 
SINGLE_BE_STREAMING_HT_MIN_REDUCTION[] = {
+        // Expand up to L2 cache always.
+        {.min_ht_mem = 0, .streaming_ht_min_reduction = 0.0},
+        // Expand into L3 cache if we look like we're getting some reduction.
+        // At present, The L2 cache is generally 1024k or more
+        {.min_ht_mem = 256 * 1024, .streaming_ht_min_reduction = 5.0},
+        // Expand into main memory if we're getting a significant reduction.
+        // The L3 cache is generally 16MB or more
+        {.min_ht_mem = 16 * 1024 * 1024, .streaming_ht_min_reduction = 10.0},
+};
+

Review Comment:
   **[Robustness]** `STREAMING_HT_MIN_REDUCTION_SIZE` is computed from 
`STREAMING_HT_MIN_REDUCTION` but is also used to bound iteration over 
`SINGLE_BE_STREAMING_HT_MIN_REDUCTION`. Both arrays currently have 3 entries, 
but if someone modifies one array without the other, this would silently cause 
out-of-bounds access or skip entries.
   
   Consider adding a `static_assert` to enforce they stay in sync:
   ```cpp
   static_assert(sizeof(SINGLE_BE_STREAMING_HT_MIN_REDUCTION) == 
sizeof(STREAMING_HT_MIN_REDUCTION),
       "SINGLE_BE and default reduction tables must have the same number of 
entries");
   ```
   Same applies to `streaming_aggregation_operator.cpp`.



##########
be/src/runtime/fragment_mgr.cpp:
##########
@@ -923,6 +923,9 @@ Status FragmentMgr::exec_plan_fragment(const 
TPipelineFragmentParams& params,
     if (!params.__isset.need_wait_execution_trigger || 
!params.need_wait_execution_trigger) {
         query_ctx->set_ready_to_execute_only();
     }
+    query_ctx->set_single_backend_query(params.__isset.query_options &&
+                                        
params.query_options.__isset.single_backend_query &&

Review Comment:
   **[Critical Bug]** `set_single_backend_query` is called here at line 926, 
but `context->prepare()` was already called at line 892. During `prepare()`, 
the pipeline tasks are built and local states are constructed — both 
`StreamingAggLocalState` and `DistinctStreamingAggLocalState` read 
`state->get_query_ctx()->is_single_backend_query()` in their constructors to 
initialize `_is_single_backend`. Since the flag hasn't been set yet at 
construction time, `_is_single_backend` will **always be `false`**.
   
   **The feature is effectively dead code.**
   
   Fix: Move this `set_single_backend_query` call to **before** 
`context->prepare()` (e.g., right after `_get_or_create_query_ctx` at line 880).



##########
be/src/pipeline/exec/streaming_aggregation_operator.cpp:
##########
@@ -66,13 +66,24 @@ struct StreamingHtMinReductionEntry {
 // of the machine that we're running on.
 static constexpr StreamingHtMinReductionEntry STREAMING_HT_MIN_REDUCTION[] = {
         // Expand up to L2 cache always.
-        {0, 0.0},
+        {.min_ht_mem = 0, .streaming_ht_min_reduction = 0.0},
         // Expand into L3 cache if we look like we're getting some reduction.
         // At present, The L2 cache is generally 1024k or more
-        {1024 * 1024, 1.1},
+        {.min_ht_mem = 256 * 1024, .streaming_ht_min_reduction = 1.1},

Review Comment:
   **[Behavioral Change — Needs Clarification]** The L2 cache threshold for the 
**default** (non-single-BE) path was silently changed from `1024 * 1024` (1MB) 
to `256 * 1024` (256KB). This means all multi-backend queries are also affected 
— they will now require a 1.1x reduction at 256KB instead of 1MB to expand into 
L3 cache.
   
   The comment still says "The L2 cache is generally 1024k or more", which is 
now inconsistent with the 256KB value.
   
   Is this intentional? If so, it should be called out in the PR description as 
a separate change. If not, this is a regression affecting all queries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [Improvement](streaming-agg) adjust single streaming threshold [doris]

Reply via email to