github-actions[bot] commented on code in PR #64601:
URL: https://github.com/apache/doris/pull/64601#discussion_r3425741553
##########
be/src/storage/tablet/tablet_reader.cpp:
##########
@@ -199,6 +200,37 @@ Status TabletReader::_capture_rs_readers(const
ReaderParams& read_params) {
_reader_context.all_access_paths = read_params.all_access_paths;
_reader_context.predicate_access_paths =
read_params.predicate_access_paths;
+ // Force a full read of delete-condition columns: the FE can't see storage
deletes and may
+ // mark them meta-only (OFFSET/NULL), whose content-less read makes the
delete predicate
+ // match nothing and leak deleted rows.
+ if (!read_params.delete_predicates.empty() &&
!_reader_context.all_access_paths.empty()) {
+ // Use the predicate's column unique id, falling back to its name for
+ // predicates without one (mirrors DeleteHandler::_parse_column_pred).
+ auto erase_delete_column = [&](const auto& delete_pred, const
TabletSchema& delete_schema) {
+ int32_t uid;
+ if (delete_pred.has_column_unique_id()) {
+ uid = delete_pred.column_unique_id();
+ } else {
+ int32_t idx =
delete_schema.field_index(delete_pred.column_name());
+ if (idx < 0) {
+ return;
+ }
+ uid = delete_schema.column(idx).unique_id();
+ }
+ _reader_context.all_access_paths.erase(uid);
+ };
+ for (const auto& rs_meta : read_params.delete_predicates) {
+ const auto& delete_predicate = rs_meta->delete_predicate();
+ const auto& delete_schema = *rs_meta->tablet_schema();
+ for (const auto& sub_pred : delete_predicate.sub_predicates_v2()) {
Review Comment:
This only strips access paths for `sub_predicates_v2()` and
`in_predicates()`, but `DeleteHandler::init()` still evaluates legacy
`delete_predicate.sub_predicates()` when v2 is empty. That is not just dead
compatibility code: cloud rowset sync copies
`RowsetMetaCloudPB.delete_predicate` into `RowsetMetaPB` without converting it
(`cloud_rowset_meta_to_doris`), so upgraded/cloud rowsets can still reach this
reader with only legacy string delete predicates. In that case
`_delete_handler` will apply `v = '--'`, but `_reader_context.all_access_paths`
keeps the FE OFFSET/NULL path for `v`, so the same deleted-row leak remains.
Please mirror the `DeleteHandler` fallback here by handling `sub_predicates()`
when v2 is empty, or collect the affected column ids through the same
parsing/conversion helper, before leaving the column eligible for meta-only
reads.
##########
regression-test/suites/delete_p0/test_offset_only_delete_leak.groovy:
##########
@@ -0,0 +1,67 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// When a query only needs string lengths (length(v), or v='' which is
rewritten to
+// length=0), a dict-encoded varchar column is read in OFFSET_ONLY mode, which
only
+// materializes lengths and leaves the char content empty. If that same column
also
+// carries a DELETE condition, the storage-side delete predicate compares the
empty
+// content and matches nothing, so the deleted rows leak back into the scan
output.
+suite("test_offset_only_delete_leak", "p0") {
+ def tbl = "test_offset_only_delete_leak"
Review Comment:
This new regression test does not follow the test rules in `AGENTS.md`:
ordinary single-table tests should hardcode the table name instead of `def
tbl`, deterministic expected results should be captured with `qt_`/`order_qt_`
plus generated `.out` rather than `assertEquals`/`assertTrue`, and the table
should not be dropped at the end. Please rewrite the case in that style so the
expected results are versioned by the regression framework and the table
remains for debugging after failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]