andygrove commented on issue #4754:
URL:
https://github.com/apache/datafusion-comet/issues/4754#issuecomment-4835284249
I ran a small spike to scope this concretely, converting one timezone-free
expression (`next_day`) from chrono to jiff end to end: #4755. Sharing what we
learned and a suggested approach for the broader work.
### Key learnings
1. **We do not need to wait on arrow-rs.** The Arrow boundary for date/time
columns is raw integers (`Date32Array::value` returns `i32` epoch days;
timestamps are `i64`). Our expression code can build jiff types directly from
those integers and skip Arrow's chrono-returning helpers
(`Date32Type::to_naive_date_opt`, `date32_to_datetime`). So Comet's own code
can move to jiff today.
2. **chrono will remain in the dependency tree until arrow-rs migrates**,
because `arrow` depends on chrono and pulls `chrono-tz` via its feature. Moving
Comet to jiff therefore buys API ergonomics and a single date/time library in
our code, but does not shrink the dependency tree on its own. Full chrono
removal is a separate downstream goal gated on upstream arrow-rs.
3. **jiff keeps epoch-day conversion private on purpose**
(`from_unix_epoch_day` / `to_unix_epoch_day` are crate-private). The public
idiom is an anchor date plus a day `Span`:
- epoch day to date: `Date::constant(1970, 1, 1).checked_add(days.days())`
- date to epoch day: `epoch.until(date).get_days()`
This is slightly more verbose than chrono, so a tiny shared helper module
is worth adding up front.
4. **jiff civil dates span years -9999..=9999**, narrower than chrono and
Date32. Out-of-range values should map to NULL (matching today's nullable
behavior). Not a concern for realistic Spark data, but worth a conscious
decision per expression.
5. **The real cost is the timezone path, not the calendar math.** Date-only
expressions are a near-mechanical translation (chrono `Weekday::days_since`
maps 1:1 to jiff `Weekday::since`, etc.). Timestamp and timezone expressions
are where the work is: they need jiff's tzdb features and a rework of
`timezone.rs` (currently chrono-tz based), plus careful comparison of jiff's tz
behavior against Spark and chrono-tz.
### Suggested approach
- **Phase 1 (low risk):** migrate timezone-free date expressions
(`next_day`, `dayname`/`monthname`, `make_date`, ...) behind a shared
`epoch_day <-> jiff::civil::Date` helper. Use `jiff = { default-features =
false, features = ["std"] }` for these since the tzdb bundles are not needed.
- **Phase 2:** migrate timestamp/timezone expressions, enabling jiff's tzdb
features and reworking `timezone.rs`. Validate tz behavior against Spark.
- **Phase 3:** track arrow-rs for chrono removal; only then can chrono leave
`Cargo.lock`.
Each phase is independently shippable, and Phase 1 can be split per
expression. The spike in #4755 is a working example of the Phase 1 pattern.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]