andygrove opened a new pull request, #4755: URL: https://github.com/apache/datafusion-comet/pull/4755
## Which issue does this PR close? Relates to #4754. > **Experimental spike, not intended for merge as-is.** This is a proof of concept to scope what migrating a temporal expression from `chrono` to `jiff` involves. Opening as a draft to share findings and gather direction. ## Rationale for this change `chrono` / `chrono-tz` are expected to be deprecated, and #4754 proposes evaluating a move to `jiff`. Rather than estimate in the abstract, this spike converts one self-contained, timezone-free expression (`next_day`) end to end so we can see the concrete API mapping, the dependency implications, and any behavioral edges before committing to a broader migration. ## What changes are included in this PR? - Rewrites `next_day.rs` to compute entirely with `jiff::civil` (`Date`, `Weekday`), removing all `chrono` usage from that file. - Adds `jiff` to `datafusion-comet-spark-expr` with `default-features = false, features = ["std"]` (a date-only expression does not need the tzdb bundles `cargo add` enables by default). - Translates the existing unit tests to jiff. ### API mapping observed | chrono | jiff | | --- | --- | | `Weekday::Mon` .. | `Weekday::Monday` .. (full names) | | `Date32Type::to_naive_date_opt(i32)` | `Date::constant(1970, 1, 1).checked_add(days.days())` | | `weekday().days_since(other)` | `weekday().since(other)` (identical semantics) | | `date + Duration::days(n)` then `from_naive_date` | `days + advance` (epoch-day arithmetic) | ### Findings 1. **arrow-rs does not need to move first.** The Arrow boundary is raw `i32` epoch days (`Date32Array::value`). The expression builds jiff types directly from the integer and bypasses Arrow's chrono-returning helpers (`to_naive_date_opt`, `date32_to_datetime`). Comet's own code can move to jiff today. 2. **`chrono` stays in the dependency tree regardless**, because `arrow` depends on it (and pulls `chrono-tz` via its feature). jiff adds a dependency; it cannot remove chrono until arrow-rs migrates. The near-term benefit is API ergonomics in Comet, not a smaller tree. 3. **jiff intentionally keeps epoch-day conversion private** (`from_unix_epoch_day` / `to_unix_epoch_day` are crate-private). The public idiom is anchor date plus a day `Span`. Expressions that genuinely need `Date` to epoch-day will use `epoch.until(date).get_days()`, which is more verbose than chrono. 4. **Narrower range.** jiff civil dates span years -9999..=9999; chrono and Date32 are wider. Out-of-range values map to NULL, preserving the existing nullable behavior. Not a concern for realistic Spark data. 5. **Timezone expressions are the real cost.** They would need jiff's tzdb features and a rework of `timezone.rs` (currently chrono-tz based). That path is deliberately out of scope here. Suggested follow-up order if we proceed: tz-free date expressions first (next_day, day/month name, make_date) behind a shared epoch-day helper, then timestamp/timezone expressions, with full chrono removal gated on arrow-rs. ## How are these changes tested? The existing `next_day` Rust unit tests were translated to jiff and pass, and `cargo clippy` is clean. End-to-end behavior continues to be covered by the existing JVM `next_day` tests, since the public function semantics are unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
