andygrove opened a new pull request, #4627: URL: https://github.com/apache/datafusion-comet/pull/4627
## Which issue does this PR close? Closes #4477. ## Rationale for this change Spark 4.1.1 added the `spark.sql.legacy.truncateForEmptyRegexSplit` flag, which makes `StringToMap` truncate trailing empty entries from the split result when enabled. Comet's native `str_to_map` always behaves as if the flag were `false`, so with legacy truncation enabled it returns trailing empty entries that Spark would have dropped, producing incorrect results. ## What changes are included in this PR? `CometStrToMap` now reports `Incompatible` when `spark.sql.legacy.truncateForEmptyRegexSplit=true`, so the expression falls back to Spark unless the user explicitly opts in via `spark.comet.expression.StringToMap.allowIncompatible=true`. The default (non-legacy) behavior is unchanged: `str_to_map` continues to run natively. The config is read by string key with a `false` default so it resolves on Spark versions where the config is not registered. ## How are these changes tested? Added a SQL file test `expressions/map/str_to_map_legacy_truncate.sql` that sets the legacy flag and asserts that `str_to_map` falls back to Spark (for both literal and column inputs) while still producing Spark-matching results. The existing `str_to_map.sql` test confirms native execution is unaffected when the flag is off. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
