alamb opened a new issue, #5200:
URL: https://github.com/apache/arrow-rs/issues/5200

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Data of type 
[DateTime::Interval](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Interval)
  (spans of time measured in calendar units like days and months)  are tricky 
to compare because the absolute size (in number of seconds) is not a fixed 
quantity. For example the `1 month` is 28 days for February but `1 month` is 31 
days in December. 
   
   This makes the seemingly simple operation of comparing two intervals quite 
complicated in practice. For example is `1 month` more or less than `30 days`? 
The answer depends on what month you are talking about.
   
   Arrow also includes a type [DateTime::Duration 
](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Duration)that
 is a fixed width of time and suffers from many fewer challenges in 
comparisons, but may not be as intuitive for humans to use;
   
   https://github.com/apache/arrow-rs/pull/5180  from @berkaysynnada  and 
@ozankabak  noted that while in general it is impossible to determine if any 
arbitrary `Interval`  is greater/equal/less than another arbitrary `Interval` 
it is possible to to order some intervals. For example, `10 days` is always 
less than `30 days`, even though it is not possible to know if `1 month` is 
less than `30 days` without additional information not encoded in the `Interval`
   
   **Describe the solution you'd like**
   I am not sure.
   
   I filed this ticket to 1. Try and describe the challenge beter and 2. Have a 
discussion about what, if any, additional semantics are needed
   
   **Describe alternatives you've considered**
   
   ### Nothing / Improve Documentation
   The current handling of intervals is now documented after 
https://github.com/apache/arrow-rs/pull/5192 so I hope there is less confusion 
about the semantics. If downstream crates want to compare intervals they can do 
so by implementing their own kernels
   
   ### New kernels
   One possibility would be to add new comparison kernels that were specific to 
intervals and implemented partial order interval comparisons as proposed in 
https://github.com/apache/arrow-rs/pull/5180 
   
   > A comparison returns true if it holds certainly. Otherwise, it returns 
false. Going back to our example, we return false for both 1 month < 30 days 
and 1 month > 30 days. It is impossible to impose a total order on intervals, 
but we can impose a consistent partial order with this logic.
   
   Something like
   
   ```rust
   let arr1 = IntervalMonthDayNanoArray::from(...);
   let arr2 = IntervalMonthDayNanoArray::from(...);
   
   // compare arr1 and arr2 using interval specific logic / partial order
   let res = interval_partial_lt()
   ```
   
   This would have the benefit of implementing interval specific semantics and 
would be very clear it is different than other comparison kernels
   
   ## Forced cast to `Duration`
   One way to define a deterministic comparsison is to covert `months` to `30 
day` increments. That would result in a well defined and fast comparisons, but 
may lead to intuitive results
   
   
   
   **Additional context**
   Here is a related discussion in DataFusion: 
https://github.com/apache/arrow-datafusion/issues/8468


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to