andygrove opened a new issue, #22717:
URL: https://github.com/apache/datafusion/issues/22717

   ### Describe the bug
   
   `datafusion-spark`'s `SparkNextDay` trims the `dayOfWeek` argument before 
matching it:
   
   
https://github.com/apache/datafusion/blob/main/datafusion/spark/src/function/datetime/next_day.rs
 (in `spark_next_day`):
   
   ```rust
   let day_of_week = day_of_week.trim().to_uppercase();
   ```
   
   Spark's `DateTimeUtils.getDayOfWeekFromString` does **not** trim — it only 
uppercases and then matches character-for-character:
   
   ```scala
   def getDayOfWeekFromString(string: UTF8String): Int = {
     val dowString = string.toString.toUpperCase(Locale.ROOT)
     dowString match {
       case "SU" | "SUN" | "SUNDAY" => SUNDAY
       case "MO" | "MON" | "MONDAY" => MONDAY
       ...
       case _ => throw new SparkIllegalArgumentException(errorClass = 
"ILLEGAL_DAY_OF_WEEK", ...)
     }
   }
   ```
   
   So a padded value such as `' MO '` is **not** a valid day name in Spark, but 
the trim makes `datafusion-spark` accept it. This is a correctness divergence 
that is independent of `spark.sql.ansi.enabled`.
   
   ### To Reproduce
   
   ```sql
   SELECT next_day(DATE '2024-01-01', ' MO ');
   ```
   
   - Spark: `NULL` (non-ANSI) / throws `ILLEGAL_DAY_OF_WEEK` (ANSI), because `' 
MO '` is not a recognised day name.
   - `datafusion-spark`: `2024-01-08` (the trim makes it match `MONDAY`).
   
   ### Expected behavior
   
   Match Spark: do not trim the `dayOfWeek` argument. Only an exact 
(case-insensitive) match of `SU`/`SUN`/`SUNDAY` … `SA`/`SAT`/`SATURDAY` should 
be accepted; anything else is invalid.
   
   ### Additional context
   
   Related: the same function carries several `// TODO: if 
spark.sql.ansi.enabled is false, returns NULL instead of an error for a 
malformed dayOfWeek` comments. `datafusion-spark` has no access to Spark's 
`spark.sql.ansi.enabled` session config, so the throw-vs-NULL choice has to be 
made by the embedding engine; that part is a known gap rather than a bug. The 
whitespace trim above, however, is wrong for every configuration.
   
   Found while fixing the equivalent divergence downstream in Apache DataFusion 
Comet (apache/datafusion-comet#4450), where we replaced the call into 
`datafusion-spark::SparkNextDay` with a Comet-native implementation that does 
not trim.
   
   Version: observed in `datafusion-spark` 53.1.0; the trim is still present on 
`main`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to