andygrove commented on issue #4754:
URL: 
https://github.com/apache/datafusion-comet/issues/4754#issuecomment-4835284249

   I ran a small spike to scope this concretely, converting one timezone-free 
expression (`next_day`) from chrono to jiff end to end: #4755. Sharing what we 
learned and a suggested approach for the broader work.
   
   ### Key learnings
   
   1. **We do not need to wait on arrow-rs.** The Arrow boundary for date/time 
columns is raw integers (`Date32Array::value` returns `i32` epoch days; 
timestamps are `i64`). Our expression code can build jiff types directly from 
those integers and skip Arrow's chrono-returning helpers 
(`Date32Type::to_naive_date_opt`, `date32_to_datetime`). So Comet's own code 
can move to jiff today.
   
   2. **chrono will remain in the dependency tree until arrow-rs migrates**, 
because `arrow` depends on chrono and pulls `chrono-tz` via its feature. Moving 
Comet to jiff therefore buys API ergonomics and a single date/time library in 
our code, but does not shrink the dependency tree on its own. Full chrono 
removal is a separate downstream goal gated on upstream arrow-rs.
   
   3. **jiff keeps epoch-day conversion private on purpose** 
(`from_unix_epoch_day` / `to_unix_epoch_day` are crate-private). The public 
idiom is an anchor date plus a day `Span`:
      - epoch day to date: `Date::constant(1970, 1, 1).checked_add(days.days())`
      - date to epoch day: `epoch.until(date).get_days()`
      This is slightly more verbose than chrono, so a tiny shared helper module 
is worth adding up front.
   
   4. **jiff civil dates span years -9999..=9999**, narrower than chrono and 
Date32. Out-of-range values should map to NULL (matching today's nullable 
behavior). Not a concern for realistic Spark data, but worth a conscious 
decision per expression.
   
   5. **The real cost is the timezone path, not the calendar math.** Date-only 
expressions are a near-mechanical translation (chrono `Weekday::days_since` 
maps 1:1 to jiff `Weekday::since`, etc.). Timestamp and timezone expressions 
are where the work is: they need jiff's tzdb features and a rework of 
`timezone.rs` (currently chrono-tz based), plus careful comparison of jiff's tz 
behavior against Spark and chrono-tz.
   
   ### Suggested approach
   
   - **Phase 1 (low risk):** migrate timezone-free date expressions 
(`next_day`, `dayname`/`monthname`, `make_date`, ...) behind a shared 
`epoch_day <-> jiff::civil::Date` helper. Use `jiff = { default-features = 
false, features = ["std"] }` for these since the tzdb bundles are not needed.
   - **Phase 2:** migrate timestamp/timezone expressions, enabling jiff's tzdb 
features and reworking `timezone.rs`. Validate tz behavior against Spark.
   - **Phase 3:** track arrow-rs for chrono removal; only then can chrono leave 
`Cargo.lock`.
   
   Each phase is independently shippable, and Phase 1 can be split per 
expression. The spike in #4755 is a working example of the Phase 1 pattern.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to