andygrove opened a new pull request, #4634:
URL: https://github.com/apache/datafusion-comet/pull/4634

   ## Which issue does this PR close?
   
   N/A. Part of expanding the JVM codegen dispatch coverage (the path that runs 
Spark's own `doGenCode` inside the Comet native pipeline for Spark-exact 
results).
   
   ## Rationale for this change
   
   Several scalar string and array expressions still fell back to Spark even 
though they are eligible for the codegen dispatch path. This PR is the first of 
three (tier 1: lowest-risk wins) that close those gaps. Tier 1 is the set that 
either needs only a one-line dispatch registration or already worked through a 
`RuntimeReplaceable` rewrite and was simply mislabeled in the reference.
   
   ## What changes are included in this PR?
   
   Two genuine wirings:
   
   * `try_to_number` (`TryToNumber`): registered as `CometCodegenDispatch`, 
mirroring the existing `to_number` (`CometToNumber`).
   * `filter` general lambda (`ArrayFilter`): the general lambda form now 
routes through the codegen dispatcher, like the other higher-order functions 
(`transform`, `exists`, `forall`). The `array_compact` form (`filter(arr, x -> 
x is not null)`) keeps its native fast path to avoid the per-batch JNI cost.
   
   Three expressions that already ran natively through their 
`RuntimeReplaceable` rewrites get SQL test coverage and a corrected reference 
status (they were marked Planned but already worked, similar to the recent 
`dayname` / `monthname` correction):
   
   * `regexp_count` rewrites to `size(regexp_extract_all(...))`.
   * `regexp_substr` rewrites to `nullif(regexp_extract(...), '')`.
   * `try_to_binary` rewrites to `try_eval(to_binary(...))`.
   
   `docs/source/user-guide/latest/expressions.md` flips all five from Planned 
to Supported.
   
   ## How are these changes tested?
   
   New Comet SQL file tests under 
`spark/src/test/resources/sql-tests/expressions/`: `string/try_to_number.sql`, 
`string/regexp_count.sql`, `string/regexp_substr.sql`, 
`string/try_to_binary.sql`, plus the existing `array/array_filter.sql` upgraded 
from `spark_answer_only` to `query` so the general lambda now asserts native 
execution. Each was run with `CometSqlFileTestSuite` on Spark 3.5 and passes 
(native execution plus result match against Spark).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to