This is an automated email from the ASF dual-hosted git repository.
englefly pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new a0907700fcb [fix](NestedColumnPruning): visit key/value/entry args in
visitMapContainsKey/Value/Entry (#64440)
a0907700fcb is described below
commit a0907700fcbbdcd3210036d7bbf9114b99c39a84
Author: minghong <[email protected]>
AuthorDate: Fri Jun 19 11:49:27 2026 +0800
[fix](NestedColumnPruning): visit key/value/entry args in
visitMapContainsKey/Value/Entry (#64440)
### What problem does this PR solve?
**Symptom**: with nested column pruning on
(`enable_prune_nested_columns=true`, the default),
`map_contains_key/value/entry` can return **wrong results** when its
search argument references a nested sub-column that is *also* used in an
`IS NULL` check in the same query.
Repro:
```sql
CREATE TABLE t (id INT, s STRUCT<a:STRING,b:INT>, m MAP<STRING,INT>)
DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
INSERT INTO t VALUES (1, named_struct('a','hello','b',100), {'hello':1});
SELECT element_at(s,'a') IS NULL,
map_contains_key(m, element_at(s,'a'))
FROM t;
-- expected: false, true
-- before this PR: false, false <-- map_contains_key is wrong
```
**Root cause**:
`AccessPathExpressionCollector.visitMapContainsKey/Value/Entry` only
recursed into the map argument (`getArgument(0)`) and never visited the
key/value search argument. So the *data* access path for the search
sub-column (`s.a`) was never registered. The only path registered for
`s.a` came from `element_at(s,'a') IS NULL`, which is the metadata-only
path `[s, a, NULL]`. With that as the sole path, `NestedColumnPruning`
pruned `s.a` to null-flag-only — at runtime the real value of `s.a` is
not read from disk, so `element_at(s,'a')` no longer yields `'hello'`
and `map_contains_key` matches against a wrong key, returning the wrong
result.
### How
1. In `visitMapContainsKey/Value/Entry`, after visiting the map
argument, also visit the key/value/entry search argument(s) with a
**fresh `CollectorContext`**, so their data access paths (e.g. `[s, a]`)
get registered and override the null-only path.
2. When the whole `map_contains_*(...)` is itself wrapped in `IS NULL`
(the incoming path is exactly `[NULL]`), route the NULL suffix to the
**map** argument (producing `[m, NULL]`) and skip the search args. This
is safe because all three functions declare `nullable() ==
child(0).nullable()` — the result is NULL iff the map is NULL (on BE the
result null flag is set solely from the map's null map, and a NULL
search key is treated as "look for a NULL element", never as
SQL-unknown). It mirrors the existing `visitMapKeys`/`visitMapValues`
handling and avoids emitting a malformed `[m, KEYS, NULL]` path.
Issue Number: close #xxx
### Release note
Fix wrong results of
`map_contains_key`/`map_contains_value`/`map_contains_entry` when the
search argument is a nested sub-column that is also used in an `IS NULL`
check, with nested column pruning enabled.
### Check List (For Author)
- Test
- [x] Regression test:
`regression-test/suites/nereids_rules_p0/column_pruning/map_contains_arg_pruning.groovy`
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [ ] No.
- [x] Yes. Fixes wrong results of `map_contains_key/value/entry`; this
is a data-correctness fix and should be back-picked to active release
branches.
- Does this need documentation?
- [x] No.
- [ ] Yes.
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
---------
Co-authored-by: Claude Opus 4.8 <[email protected]>
---
.../rewrite/AccessPathExpressionCollector.java | 70 ++++++-
.../column_pruning/map_contains_arg_pruning.out | 55 ++++++
.../column_pruning/map_contains_arg_pruning.groovy | 215 +++++++++++++++++++++
3 files changed, 333 insertions(+), 7 deletions(-)
diff --git
a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
index e7dbc5ca05d..75dce5e3618 100644
---
a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
+++
b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
@@ -341,7 +341,7 @@ public class AccessPathExpressionCollector extends
DefaultExpressionVisitor<Void
@Override
public Void visitMapKeys(MapKeys mapKeys, CollectorContext context) {
LinkedList<String> suffixPath = context.accessPathBuilder.accessPath;
- if (isFunctionNullCheckPath(suffixPath)) {
+ if (isUnderIsNull(suffixPath)) {
// map_keys(nullable_map) returns a NULL array only when the
parent map is NULL.
// The NULL suffix therefore belongs to the map itself, not to the
KEYS child.
return continueCollectAccessPath(mapKeys.getArgument(0), context);
@@ -360,7 +360,7 @@ public class AccessPathExpressionCollector extends
DefaultExpressionVisitor<Void
@Override
public Void visitMapValues(MapValues mapValues, CollectorContext context) {
LinkedList<String> suffixPath = context.accessPathBuilder.accessPath;
- if (isFunctionNullCheckPath(suffixPath)) {
+ if (isUnderIsNull(suffixPath)) {
// map_values(nullable_map) returns a NULL array only when the
parent map is NULL.
// A map entry whose value is NULL still produces a non-NULL
values array.
return continueCollectAccessPath(mapValues.getArgument(0),
context);
@@ -376,26 +376,82 @@ public class AccessPathExpressionCollector extends
DefaultExpressionVisitor<Void
return continueCollectAccessPath(mapValues.getArgument(0), context);
}
- private static boolean isFunctionNullCheckPath(List<String> suffixPath) {
+ private static boolean isUnderIsNull(List<String> suffixPath) {
return suffixPath.size() == 1 &&
AccessPathInfo.ACCESS_NULL.equals(suffixPath.get(0));
}
@Override
public Void visitMapContainsKey(MapContainsKey mapContainsKey,
CollectorContext context) {
+ // MAP_CONTAINS_KEY(<map>, <key>)
+ //
+ // isUnderIsNull checks whether the parent of this expression is IS
NULL,
+ // splitting queries into two shapes:
+ //
+ // Shape A (parent is IS NULL):
+ // SQL: SELECT ... WHERE map_contains_key(m, k) IS NULL
+ // map_contains_key(m, k) returns NULL only when m itself is NULL —
so the path
+ // should be m.NULL, not m.KEYS.NULL
+ //
+ // Shape B (regular predicate):
+ // SQL: SELECT ... WHERE map_contains_key(m, element_at(s, 'city'))
+ // AND element_at(s, 'city') IS NULL
+ // We add the KEYS prefix for the map column and visit key arg:
`element_at(s, 'city')` with a
+ // fresh context.
+ // s collects two paths:
+ // [s, city] ← from key arg (fresh context → DATA path)
+ // [s, city, NULL] ← from IS NULL
+ // NestedColumnPruning sees [s, city] and strips [s, city, NULL] in
prune phase.
+ if (isUnderIsNull(context.accessPathBuilder.accessPath)) {
+ // Shape A: skip KEYS prefix, route NULL directly to the map
column.
+ return continueCollectAccessPath(mapContainsKey.getArgument(0),
context);
+ }
+ // Shape B: map argument — only the key sub-column is needed.
context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_MAP_KEYS);
- return continueCollectAccessPath(mapContainsKey.getArgument(0),
context);
+ continueCollectAccessPath(mapContainsKey.getArgument(0), context);
+ // Shape B: key argument — visit with a fresh context to register
full-data paths.
+ Expression keyArg = mapContainsKey.getArgument(1);
+ CollectorContext keyCtx = new
CollectorContext(context.statementContext, context.bottomFilter);
+ continueCollectAccessPath(keyArg, keyCtx);
+ return null;
}
@Override
public Void visitMapContainsValue(MapContainsValue mapContainsValue,
CollectorContext context) {
+ // MAP_CONTAINS_VALUE(<map>, <value>)
+ // Same two-shape logic as visitMapContainsKey; see that method for
the full rationale.
+ //
+ // Shape A (parent is IS NULL): skip VALUES prefix, route NULL to m →
m.NULL.
+ // Shape B (regular predicate): add VALUES prefix for m, visit value
arg with fresh context.
+ if (isUnderIsNull(context.accessPathBuilder.accessPath)) {
+ return continueCollectAccessPath(mapContainsValue.getArgument(0),
context);
+ }
context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_MAP_VALUES);
- return continueCollectAccessPath(mapContainsValue.getArgument(0),
context);
+ continueCollectAccessPath(mapContainsValue.getArgument(0), context);
+ Expression valueArg = mapContainsValue.getArgument(1);
+ CollectorContext valueCtx = new
CollectorContext(context.statementContext, context.bottomFilter);
+ continueCollectAccessPath(valueArg, valueCtx);
+ return null;
}
@Override
public Void visitMapContainsEntry(MapContainsEntry mapContainsEntry,
CollectorContext context) {
+ // MAP_CONTAINS_ENTRY(<map>, <key>, <value>)
+ // Same two-shape logic as visitMapContainsKey; see that method for
the full rationale.
+ //
+ // Shape A (parent is IS NULL): skip sub-column prefix, route NULL to
m → m.NULL.
+ // Shape B (regular predicate): add ACCESS_ALL for m (needs both keys
and values),
+ // visit key/value args with fresh context.
+ if (isUnderIsNull(context.accessPathBuilder.accessPath)) {
+ return continueCollectAccessPath(mapContainsEntry.getArgument(0),
context);
+ }
context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_ALL);
- return continueCollectAccessPath(mapContainsEntry.getArgument(0),
context);
+ continueCollectAccessPath(mapContainsEntry.getArgument(0), context);
+ for (int i = 1; i < mapContainsEntry.arity(); i++) {
+ Expression entryArg = mapContainsEntry.getArgument(i);
+ CollectorContext entryCtx = new
CollectorContext(context.statementContext, context.bottomFilter);
+ continueCollectAccessPath(entryArg, entryCtx);
+ }
+ return null;
}
@Override
@@ -575,7 +631,7 @@ public class AccessPathExpressionCollector extends
DefaultExpressionVisitor<Void
@Override
public Void visitIf(If ifExpr, CollectorContext context) {
- if (isFunctionNullCheckPath(context.accessPathBuilder.accessPath)) {
+ if (isUnderIsNull(context.accessPathBuilder.accessPath)) {
ifExpr.getCondition().accept(this, new
CollectorContext(context.statementContext, context.bottomFilter));
ifExpr.getTrueValue().accept(this, copyContext(context));
ifExpr.getFalseValue().accept(this, copyContext(context));
diff --git
a/regression-test/data/nereids_rules_p0/column_pruning/map_contains_arg_pruning.out
b/regression-test/data/nereids_rules_p0/column_pruning/map_contains_arg_pruning.out
new file mode 100644
index 00000000000..498c202752f
--- /dev/null
+++
b/regression-test/data/nereids_rules_p0/column_pruning/map_contains_arg_pruning.out
@@ -0,0 +1,55 @@
+-- This file is automatically generated. You should know what you did if you
want to edit this
+-- !case1 --
+1 false true
+2 false true
+3 true \N
+4 true false
+
+-- !case2 --
+1 false false
+2 false false
+3 false \N
+4 true false
+
+-- !case3 --
+1 false false false
+2 false false false
+3 true false \N
+4 true true false
+
+-- !case4 --
+1 false
+2 false
+3 true
+4 false
+
+-- !case5 --
+1 false
+2 false
+3 true
+4 false
+
+-- !case6 --
+1 false
+2 false
+3 true
+4 false
+
+-- !case7a --
+1 true
+2 true
+3 false
+4 true
+
+-- !case7b --
+1 true
+2 true
+3 false
+4 true
+
+-- !case7c --
+1 true
+2 true
+3 false
+4 true
+
diff --git
a/regression-test/suites/nereids_rules_p0/column_pruning/map_contains_arg_pruning.groovy
b/regression-test/suites/nereids_rules_p0/column_pruning/map_contains_arg_pruning.groovy
new file mode 100644
index 00000000000..ea2a51778db
--- /dev/null
+++
b/regression-test/suites/nereids_rules_p0/column_pruning/map_contains_arg_pruning.groovy
@@ -0,0 +1,215 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Regression tests for map_contains_key / map_contains_value /
map_contains_entry
+// when the key/value/entry argument references nested sub-columns.
+//
+// Bug: visitMapContainsKey/Value/Entry only visited the map argument and
skipped
+// the key/value/entry argument. When the key is a nested sub-column expression
+// (e.g. element_at(s, 'a')) whose data path was not registered, and the same
+// sub-column also appears in IS NULL, NestedColumnPruning would prune it to
+// null-only metadata access, causing wrong results.
+
+suite("map_contains_arg_pruning") {
+ sql """ DROP TABLE IF EXISTS map_contains_arg_pruning_tbl """
+ sql """
+ CREATE TABLE map_contains_arg_pruning_tbl (
+ id INT,
+ s STRUCT<a: STRING, b: INT> NULL,
+ m MAP<STRING, INT> NULL,
+ v VARIANT NULL
+ ) ENGINE = OLAP
+ DUPLICATE KEY(id)
+ DISTRIBUTED BY HASH(id) BUCKETS 1
+ PROPERTIES ("replication_allocation" = "tag.location.default: 1")
+ """
+ sql """
+ INSERT INTO map_contains_arg_pruning_tbl VALUES
+ (1, named_struct('a', 'hello', 'b', 100), {'hello': 1, 'world':
2}, '{"k": 42}'),
+ (2, named_struct('a', 'doris', 'b', 200), {'doris': 3},
NULL),
+ (3, named_struct('a', null, 'b', 300), NULL,
'{"x": 1}'),
+ (4, NULL, {},
'{}')
+ """
+
+ // ================================================================
+ // Case 1: map_contains_key + element_at IS NULL (original bug)
+ // map_contains_key(m, element_at(s, 'a')) needs full access to s.a
+ // as the key lookup value. Without fix, only [s.a.NULL] from
+ // element_at(s, 'a') IS NULL is registered.
+ // ================================================================
+ explain {
+ sql """
+ SELECT id,
+ element_at(s, 'a') IS NULL,
+ map_contains_key(m, element_at(s, 'a'))
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+ contains "nested columns"
+ contains "s.a" // s.a should appear in access
paths
+ notContains "s.a.NULL" // should NOT be null-only
+ contains "m.KEYS" // map_contains_key needs KEYS
path
+ }
+
+ order_qt_case1 """
+ SELECT id,
+ element_at(s, 'a') IS NULL,
+ map_contains_key(m, element_at(s, 'a'))
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+
+ // ================================================================
+ // Case 2: map_contains_value, value arg references sub-column
+ // map_contains_value(m, element_at(s, 'b')) needs s.b as value
+ // ================================================================
+ explain {
+ sql """
+ SELECT id,
+ element_at(s, 'b') IS NULL,
+ map_contains_value(m, element_at(s, 'b'))
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+ contains "nested columns"
+ contains "s.b" // s.b should appear in access
paths
+ notContains "s.b.NULL" // should NOT be null-only
+ }
+
+ order_qt_case2 """
+ SELECT id,
+ element_at(s, 'b') IS NULL,
+ map_contains_value(m, element_at(s, 'b'))
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+
+ // ================================================================
+ // Case 3: map_contains_entry(m, key, value) — ternary function.
+ // visitMapContainsEntry must visit BOTH search arguments (arg1=key,
+ // arg2=value) with fresh contexts. Without the fix, only arg1 was
+ // collected and arg2 was silently skipped, leaving s.b unregistered
+ // so that element_at(s, 'b') IS NULL would prune s.b to null-only.
+ // ================================================================
+ explain {
+ sql """
+ SELECT id,
+ element_at(s, 'a') IS NULL,
+ element_at(s, 'b') IS NULL,
+ map_contains_entry(m, element_at(s, 'a'), element_at(s,
'b'))
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+ contains "nested columns"
+ contains "s.a"
+ contains "s.b"
+ notContains "s.a.NULL"
+ notContains "s.b.NULL"
+ }
+
+ order_qt_case3 """
+ SELECT id,
+ element_at(s, 'a') IS NULL,
+ element_at(s, 'b') IS NULL,
+ map_contains_entry(m, element_at(s, 'a'), element_at(s, 'b'))
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+
+ // ================================================================
+ // Case 4: map_contains_key(...) IS NULL — the key argument must NOT
+ // be collected as a full-data path because map_contains_key returns
+ // NULL only when the map itself is NULL. The search key does not
+ // affect nullability.
+ //
+ // When map_contains_key(...) IS NULL is the only expression, the
+ // NULL-only context causes the key argument to be skipped entirely,
+ // so s.a is never registered and no nested column pruning is emitted.
+ // ================================================================
+ explain {
+ sql """
+ SELECT id,
+ map_contains_key(m, element_at(s, 'a')) IS NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+ // s.a must NOT appear in access paths: the key argument inside
+ // map_contains_key(...) IS NULL is skipped in NULL-only context.
+ notContains "[s.a]"
+ }
+
+ order_qt_case4 """
+ SELECT id,
+ map_contains_key(m, element_at(s, 'a')) IS NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+
+ // ================================================================
+ // Case 5: map_contains_value(...) IS NULL — same behaviour as
+ // map_contains_key: the value argument does not affect nullability.
+ // ================================================================
+ explain {
+ sql """
+ SELECT id,
+ map_contains_value(m, element_at(s, 'b')) IS NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+ notContains "[s.b]"
+ }
+
+ order_qt_case5 """
+ SELECT id,
+ map_contains_value(m, element_at(s, 'b')) IS NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+
+ // ================================================================
+ // Case 6: map_contains_entry(...) IS NULL — both key and value
+ // arguments must be skipped. No nested column pruning is expected
+ // because only the map argument is visited with NULL-only path.
+ // ================================================================
+ explain {
+ sql """
+ SELECT id,
+ map_contains_entry(m, element_at(s, 'a'), element_at(s,
'b')) IS NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+ notContains "s.a"
+ notContains "s.b"
+ }
+
+ order_qt_case6 """
+ SELECT id,
+ map_contains_entry(m, element_at(s, 'a'), element_at(s, 'b'))
IS NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+
+ // ================================================================
+ // Case 7: IS NOT NULL variants — NOT(IS NULL) delegates to the
+ // same visitIsNull path, so the NULL-only routing logic applies.
+ // ================================================================
+ order_qt_case7a """
+ SELECT id,
+ map_contains_key(m, element_at(s, 'a')) IS NOT NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+
+ order_qt_case7b """
+ SELECT id,
+ map_contains_value(m, element_at(s, 'b')) IS NOT NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+
+ order_qt_case7c """
+ SELECT id,
+ map_contains_entry(m, element_at(s, 'a'), element_at(s, 'b'))
IS NOT NULL
+ FROM map_contains_arg_pruning_tbl ORDER BY id
+ """
+}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]