alamb opened a new issue, #5200: URL: https://github.com/apache/arrow-rs/issues/5200
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Data of type [DateTime::Interval](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Interval) (spans of time measured in calendar units like days and months) are tricky to compare because the absolute size (in number of seconds) is not a fixed quantity. For example the `1 month` is 28 days for February but `1 month` is 31 days in December. This makes the seemingly simple operation of comparing two intervals quite complicated in practice. For example is `1 month` more or less than `30 days`? The answer depends on what month you are talking about. Arrow also includes a type [DateTime::Duration ](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Duration)that is a fixed width of time and suffers from many fewer challenges in comparisons, but may not be as intuitive for humans to use; https://github.com/apache/arrow-rs/pull/5180 from @berkaysynnada and @ozankabak noted that while in general it is impossible to determine if any arbitrary `Interval` is greater/equal/less than another arbitrary `Interval` it is possible to to order some intervals. For example, `10 days` is always less than `30 days`, even though it is not possible to know if `1 month` is less than `30 days` without additional information not encoded in the `Interval` **Describe the solution you'd like** I am not sure. I filed this ticket to 1. Try and describe the challenge beter and 2. Have a discussion about what, if any, additional semantics are needed **Describe alternatives you've considered** ### Nothing / Improve Documentation The current handling of intervals is now documented after https://github.com/apache/arrow-rs/pull/5192 so I hope there is less confusion about the semantics. If downstream crates want to compare intervals they can do so by implementing their own kernels ### New kernels One possibility would be to add new comparison kernels that were specific to intervals and implemented partial order interval comparisons as proposed in https://github.com/apache/arrow-rs/pull/5180 > A comparison returns true if it holds certainly. Otherwise, it returns false. Going back to our example, we return false for both 1 month < 30 days and 1 month > 30 days. It is impossible to impose a total order on intervals, but we can impose a consistent partial order with this logic. Something like ```rust let arr1 = IntervalMonthDayNanoArray::from(...); let arr2 = IntervalMonthDayNanoArray::from(...); // compare arr1 and arr2 using interval specific logic / partial order let res = interval_partial_lt() ``` This would have the benefit of implementing interval specific semantics and would be very clear it is different than other comparison kernels ## Forced cast to `Duration` One way to define a deterministic comparsison is to covert `months` to `30 day` increments. That would result in a well defined and fast comparisons, but may lead to intuitive results **Additional context** Here is a related discussion in DataFusion: https://github.com/apache/arrow-datafusion/issues/8468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
