andygrove opened a new pull request, #4566:
URL: https://github.com/apache/datafusion-comet/pull/4566

   ## Which issue does this PR close?
   
   Closes #4451
   Closes #4450
   Closes #4449
   
   ## Rationale for this change
   
   These three correctness bugs were surfaced by the date/time audit (#4448). 
All three cause Comet to diverge from Spark for `make_date` and `next_day`:
   
   - **#4451**: `make_date(year, month, day)` returns `NULL` for an invalid 
triple even when `spark.sql.ansi.enabled=true`, where Spark throws via 
`ansiDateTimeArgumentOutOfRange` (4.0) / `ansiDateTimeError` (3.4/3.5).
   - **#4450**: `next_day(date, dayOfWeek)` trims leading/trailing whitespace 
from `dayOfWeek` before matching. Spark's 
`DateTimeUtils.getDayOfWeekFromString` does not trim, so values like `' MO '` 
succeed in Comet but are invalid in Spark.
   - **#4449**: `next_day(date, dayOfWeek)` returns `NULL` for an unrecognised 
`dayOfWeek` even under ANSI, where Spark throws `SparkIllegalArgumentException` 
(3.5+) / `IllegalArgumentException` (3.4).
   
   ## What changes are included in this PR?
   
   - `make_date`: `SparkMakeDate` now carries the ANSI flag (`fail_on_error`). 
On invalid input under ANSI it throws a `java.time`-style message 
(`MonthOfYear` / `DayOfMonth` range, or `Invalid date '...'`), reproducing the 
text Spark wraps in `DATETIME_FIELD_OUT_OF_BOUNDS` / `_LEGACY_ERROR_TEMP_2000`. 
Non-ANSI behaviour (return `NULL`) is unchanged.
   - `next_day`: replaced the upstream `datafusion-spark::SparkNextDay` (which 
trims `dayOfWeek` and never throws) with a Comet-native implementation that 
matches `DateTimeUtils.getDayOfWeekFromString` exactly: no whitespace trimming, 
and throws `Illegal input for day of week` under ANSI while returning `NULL` 
otherwise.
   - The resolved ANSI flag is read from `MakeDate.failOnError` / 
`NextDay.failOnError` in the Scala serde and passed to native through the 
existing `ScalarFunc.fail_on_error` proto field. No new proto fields or native 
shim traits were needed.
   
   ## How are these changes tested?
   
   - New Rust unit tests for `next_day` (no-trim day-of-week parsing, next-date 
math).
   - SQL file tests: `make_date_ansi.sql` and `next_day_ansi.sql` assert the 
ANSI throw paths via `expect_error`, each with a non-error sentinel query so 
the assertions cannot pass vacuously through a Spark fallback; `next_day.sql` 
gains a non-ANSI whitespace regression case. Verified passing on Spark 3.5 and 
Spark 4.1.
   
   Note: #4448 (still open) adds these same fixtures as `ignore(...)` 
reproducers; if that merges first this PR will need a trivial rebase to drop 
the duplicated files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to