Good point Weston. My proposal was written with the impression that Arrow does want to define semantic for some of these temporal types based on the existing comments in the Schema.fbs file.
For example, here is a quote taken from the comments for the Time time: /// This definition doesn't allow for leap seconds. Time values from /// measurements with leap seconds will need to be corrected when ingesting /// into Arrow (for example by replacing the value 86400 with 86399). Here is another quote for the Date type: /// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no /// leap seconds), where the values are evenly divisible by 86400000 For the interval type, we have: // A "calendar" interval which models types that don't necessarily // have a precise duration without the context of a base timestamp (e.g. // days can differ in length during day light savings time transitions). I think pushing the responsibility to define these semantics to the data producer side is also a perfectly fine design with its own trade-offs. It would make data exchange between two different systems a little bit harder because consumers need to be aware of the semantics defined by the producer. On the other hand, it does make the producer implementation easier. It also makes data exchange within the same system more efficient if that system's temporal type semantic is different from what's defined in Arrow's spec. Either way, I think it would be good if we can be consistent on our temporal type semantics in the spec. If we are making the claim that leap seconds should not be taken into account for Time, Timestamp and Date types, then it seems natural to make this claim for Interval type as well. Alternatively, we could update the spec to make all temporal types leap seconds agnostics. On Mon, Sep 13, 2021 at 12:03 PM Weston Pace <weston.p...@gmail.com> wrote: > > One could define a sorting based on 30 days months, 365 day years, and > 24 hour days. It would be consistent but can lead to some surprising > results. It appears that this is what postgres does as I got the > following ordering for an interval: > > 359 days, 12 months, 360 days, 1 year, 365 days, 366 days > > On the other hand, Joda time forbids comparison of periods (their > version of what we call an interval) and offers three ways to convert > to a duration. There is toDurationFrom(instant), > toDurationTo(instant) which give durations from specific calendar > ranges and then there is toStandardDuration() which converts to a > duration based on 24 hour days. However, toStandardDuration will > still fail if the period has >0 months or years (presumably because > months and years are too inconsistent). > > I'm not sure though that this is something that Arrow needs to define. > We aren't specifying any invalid ranges of values. I don't foresee > any interoperability concerns. A system that treated intervals as > comparable (and didn't factor in DST, leap years, etc.) will read and > write intervals the same way as a system that considers intervals > incomparable. > > This question seems to fall into the "compute" space inhabited by > topics like "is 'false && null' a false value or a null value" and > "should addition overflow or throw an exception". > > On Mon, Sep 13, 2021 at 6:23 AM QP Hou <houqp....@gmail.com> wrote: > > > > On Mon, Sep 13, 2021 at 6:18 AM Antoine Pitrou <anto...@python.org> wrote: > > > The Duration type is defined with a TimeUnit. You are probably thinking > > > about the Interval type. > > > > > > > Oops, my bad, yes, it should be Interval type not Duration. > > > > > Ok. How about daylight savings? I suppose they are taken into account > > > as well. > > > > > > > Yes, the day component in both DAY_TIME and MONTH_DAY_NANO all take > > into account of daylight savings.