Adam-Alani opened a new pull request, #22851:
URL: https://github.com/apache/datafusion/pull/22851

   ## Why
   
   In DataFusion today, `date_part`'s return type depends on the part being
   extracted: most parts (`year`, `month`, `day`, `hour`, `minute`,
   `second`, ...) return `Int32`, `nanosecond` returns `Int64`, and `epoch`
   returns `Float64`. That is internally inconsistent and disagrees with
   PostgreSQL, which defines `date_part` to always return `double
   precision` regardless of the field:
   
   > The `date_part` function is modeled on the traditional Ingres
   > equivalent to the SQL-standard function `extract` ... It returns
   > values of type `double precision`.
   >
   > — https://www.postgresql.org/docs/current/functions-datetime.html
   
   Picking up `Int32` here also caused subtle surprises: e.g.
   `date_part('year', col)` arithmetic silently overflows, and consumers
   that follow PG semantics (BI tools, dashboards, frontends that share a
   shape with PG) had to add explicit casts.
   
   ## What
   
   `date_part` (and its `datepart` alias) now always returns `Float64`.
   `Extract(field FROM source)`, which parses to `date_part`, is affected
   the same way.
   
   User-visible effects:
   
   - Query result types for `date_part`/`extract` are now `Float64` rather
     than `Int32`/`Int64` (except for `epoch`, which was already `Float64`).
   - Plans, schemas, and `arrow_typeof(...)` outputs reflect this.
   - Downstream Spark dialect behavior is unchanged — the Spark
     `date_part` wrapper casts back to `Int32`.
   
   ## How
   
   - `DatePartFunc::return_field_from_args` always advertises `Float64`.
   - `DatePartFunc::invoke_with_args` casts the arrow kernel result to
     `Float64` at the end (no-op for `epoch`, which already returns
     `Float64`; `cast` for the `Int32`/`Int64` cases).
   - `DatePartFunc::preimage` learns to accept a `Float64` literal so that
     the existing `date_part('year', col) = 2024` → date-range pushdown
     keeps working after the binary op coerces `2024` to `Float64`.
   - The dead `is_epoch` / `is_nanosecond` helpers from the old return-type
     switch are removed.
   - The Spark `date_part` wrapper (`datafusion-spark`) now wraps the
     simplified DataFusion call in a `Cast(..., Int32)`. Spark's own
     `SparkDatePart::return_field_from_args` already declares `Int32`, so
     this keeps Spark semantics and the Spark sqllogictests untouched.
   
   Tests / validation:
   
   - `datafusion/sqllogictest/test_files/datetime/date_part.slt`,
     `datetime/timestamps.slt`, `clickbench.slt`, and `group_by.slt`
     updated (`query I[I...]` → `query R[R...]`; `arrow_typeof` strings go
     from `Int32`/`Int64` to `Float64`). Numeric expected values are
     unchanged because integer-valued doubles render the same way (`2020`
     not `2020.0`) in the sqllogictest `R` format.
   - Spark `datafusion/sqllogictest/test_files/spark/datetime/date_part.slt`
     is unchanged and still passes thanks to the cast in the wrapper.
   - `docs/source/user-guide/sql/scalar_functions.md` regenerated via
     `./dev/update_function_docs.sh`, and the prose entry in
     `docs/source/user-guide/expressions.md` updated.
   - `cargo fmt --all`, `cargo clippy -p datafusion-functions -p
     datafusion-spark --all-features -- -D warnings`, and the full
     `cargo test --test sqllogictests -p datafusion-sqllogictest` (483
     files) pass locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to