andygrove opened a new pull request, #4286: URL: https://github.com/apache/datafusion-comet/pull/4286
## Which issue does this PR close? Closes #. ## Rationale for this change `substring_index` is a commonly used Spark string function that was not yet supported natively by Comet, causing fallback to Spark execution. ## What changes are included in this PR? Adds native support for the `substring_index(str, delim, count)` expression by delegating to DataFusion's built-in `substr_index` function (aliased as `substring_index`), which has identical semantics to Spark. The only adaptation needed is casting the `count` argument from `IntegerType` to `LongType` to match DataFusion's function signature. Changes: - Added `CometSubstringIndex` serde in `strings.scala` - Registered in `QueryPlanSerde.stringExpressions` map - Added comprehensive Comet SQL Test covering column/literal arguments, NULL propagation, empty strings, multi-character delimiters, multibyte UTF-8, boundary delimiters, large count values, and dictionary encoding via ConfigMatrix - Marked `substring_index` as supported in the expressions support doc The `implement-comet-expression` skill was used to scaffold this implementation. ## How are these changes tested? Comet SQL Test at `spark/src/test/resources/sql-tests/expressions/string/substring_index.sql` with `ConfigMatrix: parquet.enable.dictionary=false,true` (2 test configurations). Covers: - All-column, all-literal, and mixed column/literal argument combinations - NULL in each argument position - Empty string and empty delimiter - Positive, negative, and zero count - Count exceeding number of delimiters - Multi-character delimiters - Delimiter not found in string - Multibyte UTF-8 characters (Chinese) - Delimiter at start/end of string - Delimiter equal to the full string - Large count values (INT_MAX, -INT_MAX) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
