github-actions[bot] commented on code in PR #61713:
URL: https://github.com/apache/doris/pull/61713#discussion_r2993941993
##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -94,6 +94,16 @@ void VCollectIterator::init(TabletReader* reader, bool
ori_data_overlapping, boo
_topn_limit = 0;
DCHECK_EQ(_reader->_reader_context.filter_block_conjuncts.size(), 0);
}
+
Review Comment:
**Bug [Crash in ASAN/Debug]: Existing DCHECK at line 95 will fire**
Two lines above this new code, at line 95 (context line in this diff,
position 2), there is:
```cpp
DCHECK_EQ(_reader->_reader_context.filter_block_conjuncts.size(), 0);
```
This DCHECK is in the `else` branch — it fires when `read_orderby_key_limit`
is 0 (topn NOT active). In the general limit path, `read_orderby_key_limit` is
indeed 0, but `filter_block_conjuncts` is non-empty (set at
`olap_scanner.cpp:511`). This means:
- In debug/ASAN builds: **process crashes** on the DCHECK assertion failure.
- In release builds: DCHECK is a no-op, so the crash is masked but the
conjuncts are still silently lost.
This DCHECK needs to be updated to allow `filter_block_conjuncts` to be
non-empty when `general_read_limit > 0`.
##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -94,6 +94,16 @@ void VCollectIterator::init(TabletReader* reader, bool
ori_data_overlapping, boo
_topn_limit = 0;
DCHECK_EQ(_reader->_reader_context.filter_block_conjuncts.size(), 0);
}
+
+ // General limit pushdown: only for READER_QUERY on non-merge path
+ // (DUP_KEYS or UNIQUE_KEYS with MOW)
+ if (!_merge && _reader->_reader_type == ReaderType::READER_QUERY &&
+ _reader->_reader_context.general_read_limit > 0 &&
+ (_reader->_tablet->keys_type() == KeysType::DUP_KEYS ||
+ (_reader->_tablet->keys_type() == KeysType::UNIQUE_KEYS &&
Review Comment:
**Bug [Critical]: General limit counts unfiltered rows**
This `_inner_iter->next(block)` returns a block of rows from the storage
layer. The `_general_rows_returned` counter then counts these rows toward the
limit. However, unlike `_topn_next()` which applies `filter_block_conjuncts` at
line 357-358 before counting, this path counts **raw/unfiltered** rows.
The comment in `olap_scanner.cpp` says *"general_read_limit counts
post-filter rows, same as the topn path above"* — but no filtering code is
present here to make that true.
If the intent is to count post-filter rows (which is correct), you need to
add
`VExprContext::filter_block(_reader->_reader_context.filter_block_conjuncts,
block, block->columns())` here before updating `_general_rows_returned`,
similar to what `_topn_next` does.
##########
be/src/exec/scan/olap_scanner.cpp:
##########
@@ -502,6 +502,14 @@ Status OlapScanner::_init_tablet_reader_params(
_tablet_reader_params.filter_block_conjuncts = _conjuncts;
_conjuncts.clear();
}
+ } else if (_limit > 0 && olap_scan_local_state->_storage_no_merge()) {
+ // General limit pushdown for DUP_KEYS and UNIQUE_KEYS with MOW
(non-merge path).
+ // Only when topn optimization is NOT active (topn handles its own
limit).
+ // Move _conjuncts into storage layer so that general_read_limit
counts
+ // post-filter rows, same as the topn path above.
+ _tablet_reader_params.general_read_limit = _limit;
+ _tablet_reader_params.filter_block_conjuncts = _conjuncts;
Review Comment:
**Bug [Critical]: `filter_block_conjuncts` set but never consumed in the
general limit path**
This line moves `_conjuncts` into `filter_block_conjuncts`, and line 512
clears `_conjuncts`. However:
1. `filter_block_conjuncts` is **only evaluated** inside
`VCollectIterator::_topn_next()` (at `vcollect_iterator.cpp:357-358`). The new
general limit path in `VCollectIterator::next(Block*)` (lines 262-281) **never
calls** `VExprContext::filter_block(filter_block_conjuncts, ...)`.
2. Since `_conjuncts` is cleared, `Scanner::_filter_output_block()` becomes
a no-op (empty conjuncts vector → early return in `VExprContext::filter_block`).
**Result**: For queries with WHERE clauses (e.g., `SELECT * FROM t WHERE col
> 5 LIMIT 10`), the filter predicates are **completely dropped**. The limit
counts unfiltered rows, and unfiltered rows are returned to the user.
**Fix**: Either (a) don't move `_conjuncts` into `filter_block_conjuncts`
for the general limit path (keep filtering at the scanner level), or (b) add
explicit `VExprContext::filter_block` calls in the general limit `next()` path
in VCollectIterator, mirroring `_topn_next`.
##########
regression-test/data/query_p0/limit/test_general_limit_pushdown.out:
##########
@@ -0,0 +1,86 @@
+-- This file is automatically generated. You should know what you did if you
want to edit this
Review Comment:
**Test issue: Orphaned `.out` file**
This `.out` file contains 7 tagged sections (`dup_basic_limit`,
`dup_filter_limit`, `dup_filter_over_limit`, `dup_complex_filter_limit`,
`mow_basic_limit`, `mow_filter_limit`, `mow_complex_filter_limit`) but **none
of these tags are referenced** by any `order_qt_` query in the `.groovy` test
file.
The `.groovy` file only uses bare `sql` calls with `assertEquals` on counts
— it never produces tagged output. This file is dead data that serves no
purpose and will never be validated by the test framework.
Either remove this file, or rewrite the test to use
`order_qt_dup_basic_limit` etc. queries that produce deterministic, ordered
output matching these sections.
##########
regression-test/suites/query_p0/limit/test_general_limit_pushdown.groovy:
##########
@@ -0,0 +1,179 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Test general LIMIT pushdown into the storage layer (VCollectIterator).
+// This optimization is active only for DUP_KEYS and UNIQUE_KEYS with MOW
+// on the non-merge read path (no ORDER BY on key columns).
+suite("test_general_limit_pushdown", "p0") {
+
Review Comment:
**Test issues:**
1. **No `order_qt_` queries**: All assertions use `assertEquals` on
`count()`. No `order_qt_` or `ORDER BY` is used. Per Doris regression test
standards, use `order_qt_` prefix or explicit `ORDER BY` for deterministic
output.
2. **Does not test the feature**: These tests only verify basic SQL LIMIT
semantics (correct row counts), which would pass identically whether or not the
storage-layer optimization is active. Use `explain` plan checks to verify the
pushdown is active for DUP_KEYS/MOW and inactive for AGG_KEYS/MOR.
3. **Missing WHERE + LIMIT tests**: No test combines `WHERE` predicates with
`LIMIT` to verify that filtering still works correctly when limit is pushed
down. This is the exact scenario where the current implementation has a
correctness bug (filter predicates are dropped).
4. **Orphaned `.out` file**: The companion `.out` file has tagged output
sections that are never referenced. See comment on that file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]