github-actions[bot] commented on code in PR #61104:
URL: https://github.com/apache/doris/pull/61104#discussion_r2894195725
##########
be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp:
##########
@@ -53,6 +53,17 @@ static constexpr StreamingHtMinReductionEntry
STREAMING_HT_MIN_REDUCTION[] = {
{.min_ht_mem = 16 * 1024 * 1024, .streaming_ht_min_reduction = 2.0},
};
+static constexpr StreamingHtMinReductionEntry
SINGLE_BE_STREAMING_HT_MIN_REDUCTION[] = {
+ // Expand up to L2 cache always.
+ {.min_ht_mem = 0, .streaming_ht_min_reduction = 0.0},
+ // Expand into L3 cache if we look like we're getting some reduction.
+ // At present, The L2 cache is generally 1024k or more
+ {.min_ht_mem = 256 * 1024, .streaming_ht_min_reduction = 5.0},
+ // Expand into main memory if we're getting a significant reduction.
+ // The L3 cache is generally 16MB or more
+ {.min_ht_mem = 16 * 1024 * 1024, .streaming_ht_min_reduction = 10.0},
+};
+
Review Comment:
**[Robustness]** `STREAMING_HT_MIN_REDUCTION_SIZE` is computed from
`STREAMING_HT_MIN_REDUCTION` but is also used to bound iteration over
`SINGLE_BE_STREAMING_HT_MIN_REDUCTION`. Both arrays currently have 3 entries,
but if someone modifies one array without the other, this would silently cause
out-of-bounds access or skip entries.
Consider adding a `static_assert` to enforce they stay in sync:
```cpp
static_assert(sizeof(SINGLE_BE_STREAMING_HT_MIN_REDUCTION) ==
sizeof(STREAMING_HT_MIN_REDUCTION),
"SINGLE_BE and default reduction tables must have the same number of
entries");
```
Same applies to `streaming_aggregation_operator.cpp`.
##########
be/src/runtime/fragment_mgr.cpp:
##########
@@ -923,6 +923,9 @@ Status FragmentMgr::exec_plan_fragment(const
TPipelineFragmentParams& params,
if (!params.__isset.need_wait_execution_trigger ||
!params.need_wait_execution_trigger) {
query_ctx->set_ready_to_execute_only();
}
+ query_ctx->set_single_backend_query(params.__isset.query_options &&
+
params.query_options.__isset.single_backend_query &&
Review Comment:
**[Critical Bug]** `set_single_backend_query` is called here at line 926,
but `context->prepare()` was already called at line 892. During `prepare()`,
the pipeline tasks are built and local states are constructed — both
`StreamingAggLocalState` and `DistinctStreamingAggLocalState` read
`state->get_query_ctx()->is_single_backend_query()` in their constructors to
initialize `_is_single_backend`. Since the flag hasn't been set yet at
construction time, `_is_single_backend` will **always be `false`**.
**The feature is effectively dead code.**
Fix: Move this `set_single_backend_query` call to **before**
`context->prepare()` (e.g., right after `_get_or_create_query_ctx` at line 880).
##########
be/src/pipeline/exec/streaming_aggregation_operator.cpp:
##########
@@ -66,13 +66,24 @@ struct StreamingHtMinReductionEntry {
// of the machine that we're running on.
static constexpr StreamingHtMinReductionEntry STREAMING_HT_MIN_REDUCTION[] = {
// Expand up to L2 cache always.
- {0, 0.0},
+ {.min_ht_mem = 0, .streaming_ht_min_reduction = 0.0},
// Expand into L3 cache if we look like we're getting some reduction.
// At present, The L2 cache is generally 1024k or more
- {1024 * 1024, 1.1},
+ {.min_ht_mem = 256 * 1024, .streaming_ht_min_reduction = 1.1},
Review Comment:
**[Behavioral Change — Needs Clarification]** The L2 cache threshold for the
**default** (non-single-BE) path was silently changed from `1024 * 1024` (1MB)
to `256 * 1024` (256KB). This means all multi-backend queries are also affected
— they will now require a 1.1x reduction at 256KB instead of 1MB to expand into
L3 cache.
The comment still says "The L2 cache is generally 1024k or more", which is
now inconsistent with the 256KB value.
Is this intentional? If so, it should be called out in the PR description as
a separate change. If not, this is a regression affecting all queries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]