zhulipeng opened a new pull request, #12152:
URL: https://github.com/apache/gluten/pull/12152

   ## What changes are proposed in this pull request?
   
   When offloading `Like` to Velox, omit the escape literal argument and emit
   the 2-arg form (`like(input, pattern)`) when both:
   
   1. the pattern is a constant `Literal`, and
   2. the escape char is Spark's default `\` AND the pattern does not contain 
`\`.
   
   Otherwise we still emit the 3-arg form (`like(input, pattern, escape)`) as 
before.
   
   ## Why are the changes needed?
   
   Spark's `Like` node always carries an `escapeChar` (defaulting to `\`) even 
when
   the SQL did not specify `ESCAPE`. We previously always sent the 3-arg form to
   Velox, which forces Velox's `makeLike` (`Re2Functions.cpp`) to take the
   escape-aware path: `parsePattern` runs an extra unescape pass, and
   `determinePatternKind` runs with `escapeChar.has_value() == true`, even when
   no actual escaping is needed.
   
   When the pattern literal contains no `\`, the 2-arg and 3-arg forms are
   semantically identical, so we can safely send the cheaper 2-arg form. Velox
   already registers both signatures via `likeSignatures()`.
   
   ## How was this patch tested?
   
   - Existing `Like` query coverage in `VeloxStringFunctionsSuite` (`like` /
       `rlike` / `ilike`) — query results unchanged.
   - TPC-H Q13 end-to-end run at 6 TB scale — see Performance section.
   - `./dev/format-scala-code.sh` clean.
   
    ## Performance
   
   **TPC-H Q13 @ 6 TB scale: >6% end-to-end latency reduction.**
   
   Q13's `l_comment NOT LIKE '%special%requests%'` filter scans every lineitem 
row.
   With the 3-arg form, the constant-pattern fast paths in 
`determinePatternKind`
   are bypassed and Velox falls back to the generic `LikeWithRe2` path, which
   hot-loops in `re2::DFA::InlinedSearchLoop` — CPU profiling shows
   `InlinedSearchLoop` accounts for **>8% of total cycles** on Q13.
   
   Sending the 2-arg form lets `determinePatternKind` recognize this as the
   `kSubstrings` shape and dispatch to the dedicated 
`OptimizedLike<kSubstrings>`
   kernel, eliminating the RE2 DFA cost. No regression observed on other 
queries.
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Reviewed-by: Claude claude-opus-4-7
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to