andygrove opened a new pull request, #4755:
URL: https://github.com/apache/datafusion-comet/pull/4755

   ## Which issue does this PR close?
   
   Relates to #4754.
   
   > **Experimental spike, not intended for merge as-is.** This is a proof of 
concept to scope what migrating a temporal expression from `chrono` to `jiff` 
involves. Opening as a draft to share findings and gather direction.
   
   ## Rationale for this change
   
   `chrono` / `chrono-tz` are expected to be deprecated, and #4754 proposes 
evaluating a move to `jiff`. Rather than estimate in the abstract, this spike 
converts one self-contained, timezone-free expression (`next_day`) end to end 
so we can see the concrete API mapping, the dependency implications, and any 
behavioral edges before committing to a broader migration.
   
   ## What changes are included in this PR?
   
   - Rewrites `next_day.rs` to compute entirely with `jiff::civil` (`Date`, 
`Weekday`), removing all `chrono` usage from that file.
   - Adds `jiff` to `datafusion-comet-spark-expr` with `default-features = 
false, features = ["std"]` (a date-only expression does not need the tzdb 
bundles `cargo add` enables by default).
   - Translates the existing unit tests to jiff.
   
   ### API mapping observed
   
   | chrono | jiff |
   | --- | --- |
   | `Weekday::Mon` .. | `Weekday::Monday` .. (full names) |
   | `Date32Type::to_naive_date_opt(i32)` | `Date::constant(1970, 1, 
1).checked_add(days.days())` |
   | `weekday().days_since(other)` | `weekday().since(other)` (identical 
semantics) |
   | `date + Duration::days(n)` then `from_naive_date` | `days + advance` 
(epoch-day arithmetic) |
   
   ### Findings
   
   1. **arrow-rs does not need to move first.** The Arrow boundary is raw `i32` 
epoch days (`Date32Array::value`). The expression builds jiff types directly 
from the integer and bypasses Arrow's chrono-returning helpers 
(`to_naive_date_opt`, `date32_to_datetime`). Comet's own code can move to jiff 
today.
   2. **`chrono` stays in the dependency tree regardless**, because `arrow` 
depends on it (and pulls `chrono-tz` via its feature). jiff adds a dependency; 
it cannot remove chrono until arrow-rs migrates. The near-term benefit is API 
ergonomics in Comet, not a smaller tree.
   3. **jiff intentionally keeps epoch-day conversion private** 
(`from_unix_epoch_day` / `to_unix_epoch_day` are crate-private). The public 
idiom is anchor date plus a day `Span`. Expressions that genuinely need `Date` 
to epoch-day will use `epoch.until(date).get_days()`, which is more verbose 
than chrono.
   4. **Narrower range.** jiff civil dates span years -9999..=9999; chrono and 
Date32 are wider. Out-of-range values map to NULL, preserving the existing 
nullable behavior. Not a concern for realistic Spark data.
   5. **Timezone expressions are the real cost.** They would need jiff's tzdb 
features and a rework of `timezone.rs` (currently chrono-tz based). That path 
is deliberately out of scope here.
   
   Suggested follow-up order if we proceed: tz-free date expressions first 
(next_day, day/month name, make_date) behind a shared epoch-day helper, then 
timestamp/timezone expressions, with full chrono removal gated on arrow-rs.
   
   ## How are these changes tested?
   
   The existing `next_day` Rust unit tests were translated to jiff and pass, 
and `cargo clippy` is clean. End-to-end behavior continues to be covered by the 
existing JVM `next_day` tests, since the public function semantics are 
unchanged.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to