Adam-Alani opened a new pull request, #22851: URL: https://github.com/apache/datafusion/pull/22851
## Why In DataFusion today, `date_part`'s return type depends on the part being extracted: most parts (`year`, `month`, `day`, `hour`, `minute`, `second`, ...) return `Int32`, `nanosecond` returns `Int64`, and `epoch` returns `Float64`. That is internally inconsistent and disagrees with PostgreSQL, which defines `date_part` to always return `double precision` regardless of the field: > The `date_part` function is modeled on the traditional Ingres > equivalent to the SQL-standard function `extract` ... It returns > values of type `double precision`. > > — https://www.postgresql.org/docs/current/functions-datetime.html Picking up `Int32` here also caused subtle surprises: e.g. `date_part('year', col)` arithmetic silently overflows, and consumers that follow PG semantics (BI tools, dashboards, frontends that share a shape with PG) had to add explicit casts. ## What `date_part` (and its `datepart` alias) now always returns `Float64`. `Extract(field FROM source)`, which parses to `date_part`, is affected the same way. User-visible effects: - Query result types for `date_part`/`extract` are now `Float64` rather than `Int32`/`Int64` (except for `epoch`, which was already `Float64`). - Plans, schemas, and `arrow_typeof(...)` outputs reflect this. - Downstream Spark dialect behavior is unchanged — the Spark `date_part` wrapper casts back to `Int32`. ## How - `DatePartFunc::return_field_from_args` always advertises `Float64`. - `DatePartFunc::invoke_with_args` casts the arrow kernel result to `Float64` at the end (no-op for `epoch`, which already returns `Float64`; `cast` for the `Int32`/`Int64` cases). - `DatePartFunc::preimage` learns to accept a `Float64` literal so that the existing `date_part('year', col) = 2024` → date-range pushdown keeps working after the binary op coerces `2024` to `Float64`. - The dead `is_epoch` / `is_nanosecond` helpers from the old return-type switch are removed. - The Spark `date_part` wrapper (`datafusion-spark`) now wraps the simplified DataFusion call in a `Cast(..., Int32)`. Spark's own `SparkDatePart::return_field_from_args` already declares `Int32`, so this keeps Spark semantics and the Spark sqllogictests untouched. Tests / validation: - `datafusion/sqllogictest/test_files/datetime/date_part.slt`, `datetime/timestamps.slt`, `clickbench.slt`, and `group_by.slt` updated (`query I[I...]` → `query R[R...]`; `arrow_typeof` strings go from `Int32`/`Int64` to `Float64`). Numeric expected values are unchanged because integer-valued doubles render the same way (`2020` not `2020.0`) in the sqllogictest `R` format. - Spark `datafusion/sqllogictest/test_files/spark/datetime/date_part.slt` is unchanged and still passes thanks to the cast in the wrapper. - `docs/source/user-guide/sql/scalar_functions.md` regenerated via `./dev/update_function_docs.sh`, and the prose entry in `docs/source/user-guide/expressions.md` updated. - `cargo fmt --all`, `cargo clippy -p datafusion-functions -p datafusion-spark --all-features -- -D warnings`, and the full `cargo test --test sqllogictests -p datafusion-sqllogictest` (483 files) pass locally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
