airborne12 opened a new pull request, #61584:
URL: https://github.com/apache/doris/pull/61584

   ## Summary
   - Fix `visitMatch()` crash ("SlotReference in Match failed to get Column") 
when MATCH references alias slots that lost column metadata (e.g., 
`CAST(variant['key'] AS VARCHAR) AS fn`)
   - Add graceful fallback in `ExpressionTranslator.visitMatch()` when slot 
metadata is missing
   - New rewrite rule `PushDownMatchPredicateAsVirtualColumn` that extracts 
MATCH from join/filter predicates and pushes it as a virtual column on OlapScan 
for inverted index evaluation
   
   ## Problem
   When all three conditions are met, MATCH crashes:
   1. MATCH left side is an alias over a non-trivial expression (Cast, 
ElementAt, etc.) — `Alias.toSlot()` loses `originalColumn`/`originalTable` 
metadata
   2. OR predicate references join-dependent columns (`l.objectId IS NOT NULL`, 
EXISTS mark `$c$1`) — prevents MATCH from being pushed below the join
   3. MATCH is stuck at the join layer referencing a metadata-less alias slot → 
`visitMatch()` throws
   
   **Reproducer:**
   ```sql
   WITH contacts AS (
     SELECT objectId, CAST(overflowProperties['string_8'] AS VARCHAR) AS 
firstName
     FROM objects_small WHERE portalId = 865815822
   ),
   lists AS (
     SELECT objectId FROM lists_v2 WHERE portalId = 865815822
   )
   SELECT o.objectId
   FROM contacts o LEFT JOIN lists l ON o.objectId = l.objectId
   WHERE firstName MATCH_ANY 'john' OR l.objectId IS NOT NULL;
   -- ERROR: SlotReference in Match failed to get Column
   ```
   
   ## Solution
   1. **`ExpressionTranslator.visitMatch()`**: 
`getOriginalColumn().orElse(null)` instead of `orElseThrow()`. When 
column/table metadata is missing, `invertedIndex = null` and BE evaluates via 
slow-path expression evaluation.
   
   2. **`PushDownMatchPredicateAsVirtualColumn`** (new rewrite rule): Traces 
the MATCH's alias slot back through the Project to the original column 
expression, creates a virtual column `(original_expr MATCH_ANY 'term')` on 
OlapScan, and replaces the MATCH in the predicate with the boolean slot 
reference. BE evaluates via `fast_execute()` using inverted index.
   
   **Plan transformation:**
   ```
   Before:
     Filter(fn MATCH_ANY 'john' OR l.objectId IS NOT NULL)  ← crashes or slow 
path
       └── Join → Project[CAST(col) as fn] → OlapScan
   
   After:
     Filter(__match_vc OR l.objectId IS NOT NULL)  ← boolean reference, no crash
       └── Join → Project[fn, __match_vc] → OlapScan[virtualColumns=[(CAST(col) 
MATCH_ANY 'john')]]
                                                       ↑ inverted index fast 
path
   ```
   
   ## Test plan
   - [x] Manual test: verified with variant subcolumn + EXISTS + OR, LEFT JOIN 
+ OR
   - [ ] Regression test
   - [ ] Unit Test
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to