924060929 commented on PR #62205:
URL: https://github.com/apache/doris/pull/62205#issuecomment-4852481056

   Post-merge review of the OFFSET path-suffix scheme introduced here. All 
points below are still present on current master.
   
   **1. Encoding the read-mode as a path component collides with user field 
names.**
   `"OFFSET"`/`"NULL"` share the same string namespace as struct field names. 
FE separates them by case (field names lowercased, meta constants uppercase, 
`equals`); BE `_check_and_set_meta_read_mode` compares with `StringCaseEqual` 
(case-insensitive), so a struct field literally named `null` breaks. Verified:
   
   ```sql
   CREATE TABLE t(id INT, s STRUCT<`null`:INT, x:INT>)
     DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 
PROPERTIES("replication_num"="1");
   INSERT INTO t VALUES(1, named_struct('null',100,'x',200));
   SELECT s.`null` FROM t;   -- pruning ON -> NULL (wrong);  OFF -> 100 
(correct)
   ```
   
   BE reads the field name `null` as the NULL meta → the struct enters 
`NULL_MAP_ONLY` and skips all sub-columns (also triggers for structs nested in 
array/map; an `offset` field only survives because the struct reader has no 
`OFFSET_ONLY` branch). Minimal fix: case-sensitive comparison in BE (matching 
the field-name check right below it).
   
   **2. `element_at(map, k)` should emit KEYS + VALUES paths at collection 
time, not a `*` that is expanded later.**
   It currently collects `[m, *]` (keys+values coupled). Wrapped in an 
offset/null function — `cardinality(element_at(m, k))` → `[m, *, OFFSET]` — the 
`*` conflicts with the read-mode (keys must be read in full to match the key, 
only values are offset-only), so `expandMapStarPaths` has to split it back into 
`[m, KEYS]` + `[m, VALUES, OFFSET]` in the strip phase. Since `element_at` is 
inherently "keys in full + values on demand", emitting the two paths directly 
at collection time avoids the round-trip; the split is required once 
OFFSET/NULL exist regardless of representation.
   
   **3. Read-mode is orthogonal to the path — keeping it in the path string is 
the root of 1 & 2.**
   offset-only / null-only / full is orthogonal to *which* sub-column is read. 
Putting it in the path component both pollutes the field-name namespace (1) and 
forces string-level post-processing (the `*` expansion in 2, the prefix 
comparisons in `MetaPathStriper`). A separate type dimension on the access path 
— the `DATA`/`META` already present on `ColumnAccessPath` — keeps read-mode and 
path orthogonal and avoids both classes of problem.
   
   **4. `StringEmptyToLengthRule` unconditionally rewrites `str = '' → 
length(str) = 0`, losing sargability.**
   `str = ''` can use inverted index / bloom / zonemap; `length(str) = 0` 
cannot — a de-optimization for indexed / short string columns. Worth gating.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to