andygrove opened a new pull request, #4566: URL: https://github.com/apache/datafusion-comet/pull/4566
## Which issue does this PR close? Closes #4451 Closes #4450 Closes #4449 ## Rationale for this change These three correctness bugs were surfaced by the date/time audit (#4448). All three cause Comet to diverge from Spark for `make_date` and `next_day`: - **#4451**: `make_date(year, month, day)` returns `NULL` for an invalid triple even when `spark.sql.ansi.enabled=true`, where Spark throws via `ansiDateTimeArgumentOutOfRange` (4.0) / `ansiDateTimeError` (3.4/3.5). - **#4450**: `next_day(date, dayOfWeek)` trims leading/trailing whitespace from `dayOfWeek` before matching. Spark's `DateTimeUtils.getDayOfWeekFromString` does not trim, so values like `' MO '` succeed in Comet but are invalid in Spark. - **#4449**: `next_day(date, dayOfWeek)` returns `NULL` for an unrecognised `dayOfWeek` even under ANSI, where Spark throws `SparkIllegalArgumentException` (3.5+) / `IllegalArgumentException` (3.4). ## What changes are included in this PR? - `make_date`: `SparkMakeDate` now carries the ANSI flag (`fail_on_error`). On invalid input under ANSI it throws a `java.time`-style message (`MonthOfYear` / `DayOfMonth` range, or `Invalid date '...'`), reproducing the text Spark wraps in `DATETIME_FIELD_OUT_OF_BOUNDS` / `_LEGACY_ERROR_TEMP_2000`. Non-ANSI behaviour (return `NULL`) is unchanged. - `next_day`: replaced the upstream `datafusion-spark::SparkNextDay` (which trims `dayOfWeek` and never throws) with a Comet-native implementation that matches `DateTimeUtils.getDayOfWeekFromString` exactly: no whitespace trimming, and throws `Illegal input for day of week` under ANSI while returning `NULL` otherwise. - The resolved ANSI flag is read from `MakeDate.failOnError` / `NextDay.failOnError` in the Scala serde and passed to native through the existing `ScalarFunc.fail_on_error` proto field. No new proto fields or native shim traits were needed. ## How are these changes tested? - New Rust unit tests for `next_day` (no-trim day-of-week parsing, next-date math). - SQL file tests: `make_date_ansi.sql` and `next_day_ansi.sql` assert the ANSI throw paths via `expect_error`, each with a non-error sentinel query so the assertions cannot pass vacuously through a Spark fallback; `next_day.sql` gains a non-ANSI whitespace regression case. Verified passing on Spark 3.5 and Spark 4.1. Note: #4448 (still open) adds these same fixtures as `ignore(...)` reproducers; if that merges first this PR will need a trivial rebase to drop the duplicated files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
