andygrove opened a new issue, #22717: URL: https://github.com/apache/datafusion/issues/22717
### Describe the bug `datafusion-spark`'s `SparkNextDay` trims the `dayOfWeek` argument before matching it: https://github.com/apache/datafusion/blob/main/datafusion/spark/src/function/datetime/next_day.rs (in `spark_next_day`): ```rust let day_of_week = day_of_week.trim().to_uppercase(); ``` Spark's `DateTimeUtils.getDayOfWeekFromString` does **not** trim — it only uppercases and then matches character-for-character: ```scala def getDayOfWeekFromString(string: UTF8String): Int = { val dowString = string.toString.toUpperCase(Locale.ROOT) dowString match { case "SU" | "SUN" | "SUNDAY" => SUNDAY case "MO" | "MON" | "MONDAY" => MONDAY ... case _ => throw new SparkIllegalArgumentException(errorClass = "ILLEGAL_DAY_OF_WEEK", ...) } } ``` So a padded value such as `' MO '` is **not** a valid day name in Spark, but the trim makes `datafusion-spark` accept it. This is a correctness divergence that is independent of `spark.sql.ansi.enabled`. ### To Reproduce ```sql SELECT next_day(DATE '2024-01-01', ' MO '); ``` - Spark: `NULL` (non-ANSI) / throws `ILLEGAL_DAY_OF_WEEK` (ANSI), because `' MO '` is not a recognised day name. - `datafusion-spark`: `2024-01-08` (the trim makes it match `MONDAY`). ### Expected behavior Match Spark: do not trim the `dayOfWeek` argument. Only an exact (case-insensitive) match of `SU`/`SUN`/`SUNDAY` … `SA`/`SAT`/`SATURDAY` should be accepted; anything else is invalid. ### Additional context Related: the same function carries several `// TODO: if spark.sql.ansi.enabled is false, returns NULL instead of an error for a malformed dayOfWeek` comments. `datafusion-spark` has no access to Spark's `spark.sql.ansi.enabled` session config, so the throw-vs-NULL choice has to be made by the embedding engine; that part is a known gap rather than a bug. The whitespace trim above, however, is wrong for every configuration. Found while fixing the equivalent divergence downstream in Apache DataFusion Comet (apache/datafusion-comet#4450), where we replaced the call into `datafusion-spark::SparkNextDay` with a Comet-native implementation that does not trim. Version: observed in `datafusion-spark` 53.1.0; the trim is still present on `main`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
