andygrove opened a new pull request, #4634: URL: https://github.com/apache/datafusion-comet/pull/4634
## Which issue does this PR close? N/A. Part of expanding the JVM codegen dispatch coverage (the path that runs Spark's own `doGenCode` inside the Comet native pipeline for Spark-exact results). ## Rationale for this change Several scalar string and array expressions still fell back to Spark even though they are eligible for the codegen dispatch path. This PR is the first of three (tier 1: lowest-risk wins) that close those gaps. Tier 1 is the set that either needs only a one-line dispatch registration or already worked through a `RuntimeReplaceable` rewrite and was simply mislabeled in the reference. ## What changes are included in this PR? Two genuine wirings: * `try_to_number` (`TryToNumber`): registered as `CometCodegenDispatch`, mirroring the existing `to_number` (`CometToNumber`). * `filter` general lambda (`ArrayFilter`): the general lambda form now routes through the codegen dispatcher, like the other higher-order functions (`transform`, `exists`, `forall`). The `array_compact` form (`filter(arr, x -> x is not null)`) keeps its native fast path to avoid the per-batch JNI cost. Three expressions that already ran natively through their `RuntimeReplaceable` rewrites get SQL test coverage and a corrected reference status (they were marked Planned but already worked, similar to the recent `dayname` / `monthname` correction): * `regexp_count` rewrites to `size(regexp_extract_all(...))`. * `regexp_substr` rewrites to `nullif(regexp_extract(...), '')`. * `try_to_binary` rewrites to `try_eval(to_binary(...))`. `docs/source/user-guide/latest/expressions.md` flips all five from Planned to Supported. ## How are these changes tested? New Comet SQL file tests under `spark/src/test/resources/sql-tests/expressions/`: `string/try_to_number.sql`, `string/regexp_count.sql`, `string/regexp_substr.sql`, `string/try_to_binary.sql`, plus the existing `array/array_filter.sql` upgraded from `spark_answer_only` to `query` so the general lambda now asserts native execution. Each was run with `CometSqlFileTestSuite` on Spark 3.5 and passes (native execution plus result match against Spark). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
