andygrove opened a new pull request, #4627:
URL: https://github.com/apache/datafusion-comet/pull/4627

   ## Which issue does this PR close?
   
   Closes #4477.
   
   ## Rationale for this change
   
   Spark 4.1.1 added the `spark.sql.legacy.truncateForEmptyRegexSplit` flag, 
which makes `StringToMap` truncate trailing empty entries from the split result 
when enabled. Comet's native `str_to_map` always behaves as if the flag were 
`false`, so with legacy truncation enabled it returns trailing empty entries 
that Spark would have dropped, producing incorrect results.
   
   ## What changes are included in this PR?
   
   `CometStrToMap` now reports `Incompatible` when 
`spark.sql.legacy.truncateForEmptyRegexSplit=true`, so the expression falls 
back to Spark unless the user explicitly opts in via 
`spark.comet.expression.StringToMap.allowIncompatible=true`. The default 
(non-legacy) behavior is unchanged: `str_to_map` continues to run natively. The 
config is read by string key with a `false` default so it resolves on Spark 
versions where the config is not registered.
   
   ## How are these changes tested?
   
   Added a SQL file test `expressions/map/str_to_map_legacy_truncate.sql` that 
sets the legacy flag and asserts that `str_to_map` falls back to Spark (for 
both literal and column inputs) while still producing Spark-matching results. 
The existing `str_to_map.sql` test confirms native execution is unaffected when 
the flag is off.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to