github-actions[bot] commented on code in PR #64601:
URL: https://github.com/apache/doris/pull/64601#discussion_r3425741553


##########
be/src/storage/tablet/tablet_reader.cpp:
##########
@@ -199,6 +200,37 @@ Status TabletReader::_capture_rs_readers(const 
ReaderParams& read_params) {
     _reader_context.all_access_paths = read_params.all_access_paths;
     _reader_context.predicate_access_paths = 
read_params.predicate_access_paths;
 
+    // Force a full read of delete-condition columns: the FE can't see storage 
deletes and may
+    // mark them meta-only (OFFSET/NULL), whose content-less read makes the 
delete predicate
+    // match nothing and leak deleted rows.
+    if (!read_params.delete_predicates.empty() && 
!_reader_context.all_access_paths.empty()) {
+        // Use the predicate's column unique id, falling back to its name for
+        // predicates without one (mirrors DeleteHandler::_parse_column_pred).
+        auto erase_delete_column = [&](const auto& delete_pred, const 
TabletSchema& delete_schema) {
+            int32_t uid;
+            if (delete_pred.has_column_unique_id()) {
+                uid = delete_pred.column_unique_id();
+            } else {
+                int32_t idx = 
delete_schema.field_index(delete_pred.column_name());
+                if (idx < 0) {
+                    return;
+                }
+                uid = delete_schema.column(idx).unique_id();
+            }
+            _reader_context.all_access_paths.erase(uid);
+        };
+        for (const auto& rs_meta : read_params.delete_predicates) {
+            const auto& delete_predicate = rs_meta->delete_predicate();
+            const auto& delete_schema = *rs_meta->tablet_schema();
+            for (const auto& sub_pred : delete_predicate.sub_predicates_v2()) {

Review Comment:
   This only strips access paths for `sub_predicates_v2()` and 
`in_predicates()`, but `DeleteHandler::init()` still evaluates legacy 
`delete_predicate.sub_predicates()` when v2 is empty. That is not just dead 
compatibility code: cloud rowset sync copies 
`RowsetMetaCloudPB.delete_predicate` into `RowsetMetaPB` without converting it 
(`cloud_rowset_meta_to_doris`), so upgraded/cloud rowsets can still reach this 
reader with only legacy string delete predicates. In that case 
`_delete_handler` will apply `v = '--'`, but `_reader_context.all_access_paths` 
keeps the FE OFFSET/NULL path for `v`, so the same deleted-row leak remains. 
Please mirror the `DeleteHandler` fallback here by handling `sub_predicates()` 
when v2 is empty, or collect the affected column ids through the same 
parsing/conversion helper, before leaving the column eligible for meta-only 
reads.



##########
regression-test/suites/delete_p0/test_offset_only_delete_leak.groovy:
##########
@@ -0,0 +1,67 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// When a query only needs string lengths (length(v), or v='' which is 
rewritten to
+// length=0), a dict-encoded varchar column is read in OFFSET_ONLY mode, which 
only
+// materializes lengths and leaves the char content empty. If that same column 
also
+// carries a DELETE condition, the storage-side delete predicate compares the 
empty
+// content and matches nothing, so the deleted rows leak back into the scan 
output.
+suite("test_offset_only_delete_leak", "p0") {
+    def tbl = "test_offset_only_delete_leak"

Review Comment:
   This new regression test does not follow the test rules in `AGENTS.md`: 
ordinary single-table tests should hardcode the table name instead of `def 
tbl`, deterministic expected results should be captured with `qt_`/`order_qt_` 
plus generated `.out` rather than `assertEquals`/`assertTrue`, and the table 
should not be dropped at the end. Please rewrite the case in that style so the 
expected results are versioned by the regression framework and the table 
remains for debugging after failures.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to