924060929 commented on PR #62205:
URL: https://github.com/apache/doris/pull/62205#issuecomment-4852481056
Post-merge review of the OFFSET path-suffix scheme introduced here. All
points below are still present on current master.
**1. Encoding the read-mode as a path component collides with user field
names.**
`"OFFSET"`/`"NULL"` share the same string namespace as struct field names.
FE separates them by case (field names lowercased, meta constants uppercase,
`equals`); BE `_check_and_set_meta_read_mode` compares with `StringCaseEqual`
(case-insensitive), so a struct field literally named `null` breaks. Verified:
```sql
CREATE TABLE t(id INT, s STRUCT<`null`:INT, x:INT>)
DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES("replication_num"="1");
INSERT INTO t VALUES(1, named_struct('null',100,'x',200));
SELECT s.`null` FROM t; -- pruning ON -> NULL (wrong); OFF -> 100
(correct)
```
BE reads the field name `null` as the NULL meta → the struct enters
`NULL_MAP_ONLY` and skips all sub-columns (also triggers for structs nested in
array/map; an `offset` field only survives because the struct reader has no
`OFFSET_ONLY` branch). Minimal fix: case-sensitive comparison in BE (matching
the field-name check right below it).
**2. `element_at(map, k)` should emit KEYS + VALUES paths at collection
time, not a `*` that is expanded later.**
It currently collects `[m, *]` (keys+values coupled). Wrapped in an
offset/null function — `cardinality(element_at(m, k))` → `[m, *, OFFSET]` — the
`*` conflicts with the read-mode (keys must be read in full to match the key,
only values are offset-only), so `expandMapStarPaths` has to split it back into
`[m, KEYS]` + `[m, VALUES, OFFSET]` in the strip phase. Since `element_at` is
inherently "keys in full + values on demand", emitting the two paths directly
at collection time avoids the round-trip; the split is required once
OFFSET/NULL exist regardless of representation.
**3. Read-mode is orthogonal to the path — keeping it in the path string is
the root of 1 & 2.**
offset-only / null-only / full is orthogonal to *which* sub-column is read.
Putting it in the path component both pollutes the field-name namespace (1) and
forces string-level post-processing (the `*` expansion in 2, the prefix
comparisons in `MetaPathStriper`). A separate type dimension on the access path
— the `DATA`/`META` already present on `ColumnAccessPath` — keeps read-mode and
path orthogonal and avoids both classes of problem.
**4. `StringEmptyToLengthRule` unconditionally rewrites `str = '' →
length(str) = 0`, losing sargability.**
`str = ''` can use inverted index / bloom / zonemap; `length(str) = 0`
cannot — a de-optimization for indexed / short string columns. Worth gating.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]