I don't know many examples of interval being used in the real world. But here's the kind of thing: the policy is that an offer is open for 60 hours, so if the offer is made to a particular customer at 12:34pm on Sunday, you want to compute that it ends at 12:34am on Wednesday. The interval "60 hours" is really just syntactic sugar for 216,000 seconds. You could write it as interval '60' hour, or interval '2:12' day to hour, or interval '129600' second, but the values and underlying representation are the same. (Interval '1:36' day to hour is not a valid value, because 36 is out of the valid hour range 0..23, but you could construct the value using interval '1' day + 36 * interval '1' hour.)
My understanding is that a timedelta (2 day 12 hours) is different from timedelta (60 hours) and timedelta (1 day 36 hours), but all are valid timedelta values. For my offer expiration example the SQL-style interval is sufficient, because there is no material difference between 2:12 and 1:36. But I am sure you can provide use cases where timedelta is necessary. I don't claim one is better than the other, and I'm not volunteering to implement either of them, so I don't have a say which you should do first. But please keep the names "interval" and "timedelta", so the various communities aren't confused about semantics. On Wed, Nov 8, 2017 at 2:15 PM, Wes McKinney <wesmck...@gmail.com> wrote: > Pleading ignorance on use of the SQL interval type, my prior would be > that many algorithms would first convert the interval components into > an absolute timedelta. Is that not the case? > > My preference right now would be to have a single Interval type, where > the DAY_TIME type actually contains an absolute delta based on the > indicated unit. It is true that the divmod operation to decompose into > number of days and intraday units (milliseconds, nanoseconds, etc.) is > not the cheapest, but I don't know the use cases for the type well > enough to judge. > > On Wed, Nov 8, 2017 at 5:10 PM, Jacques Nadeau <jacq...@apache.org> wrote: >> I'm all for moving interval to the new definition. I think we should avoid >> introducing a timedelta type until it is really important. We need several >> users demanding a type before we should implement it. Otherwise, we have >> huge amounts of type bloat (which means nothing will fully implement the >> spec and be able to interoperate). >> >> On Sat, Nov 4, 2017 at 3:46 PM, Julian Hyde <jh...@apache.org> wrote: >> >>> As I understand it, the proposal is to have both an interval data type[1] >>> and a timedelta type[2]. The interval is compatible with the SQL standard >>> (but not Postgres) and can be implemented with a single numeric value >>> representing a particular time unit (year, month, day, hour, minute, >>> second, and possibly fractional seconds); timedelta is an array of numeric >>> values, one for a set of time units. >>> >>> I think we should have both, and operators to convert between them. >>> Interval is certainly efficient, and is what some applications need, but >>> some applications need timedelta. >>> >>> Julian >>> >>> [1] https://issues.apache.org/jira/browse/ARROW-352 < >>> https://issues.apache.org/jira/browse/ARROW-352> >>> >>> [2] https://issues.apache.org/jira/browse/ARROW-835 < >>> https://issues.apache.org/jira/browse/ARROW-835> >>> >>> > On Nov 4, 2017, at 1:26 PM, Wes McKinney <wesmck...@gmail.com> wrote: >>> > >>> > It seems like we don't have enough input on this topic to make a >>> > decision right now. I placed the JIRA ARROW-352 in the 0.9.0 >>> > milestone, but we really should try to get this done soon so that >>> > downstream users are not blocked on using Arrow to send around >>> > interval data. >>> > >>> > - Wes >>> > >>> > On Fri, Oct 20, 2017 at 12:34 AM, Li Jin <ice.xell...@gmail.com> wrote: >>> >> +1 on this one. >>> >> >>> >> My reason is this makes timestamp/interval calculation faster, i.e, >>> >> "timestamp + interval < timestamp" should be faster without dealing with >>> >> two component in interval. Although I am not quite sure about the >>> rational >>> >> behind the two component representation, which seems to be what is used >>> in >>> >> Spark: >>> >> >>> >> https://github.com/apache/spark/blob/master/common/ >>> unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java >>> >> >>> >> I am interested in hearing reasoning behind two component. >>> >> >>> >> On Wed, Oct 18, 2017 at 8:32 PM, Wes McKinney <wesmck...@gmail.com> >>> wrote: >>> >> >>> >>> I opened this patch over 2 months ago to add some additional metadata >>> >>> for intervals: >>> >>> >>> >>> https://github.com/apache/arrow/pull/920 >>> >>> >>> >>> Java supports a two-component DAY_TIME interval type as a combo of >>> >>> days and milliseconds: >>> >>> >>> >>> https://github.com/apache/arrow/blob/402baa4ec391b61dd37c770ae7978d >>> >>> 51b9b550fa/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L106 >>> >>> >>> >>> I propose that we change the interval representation to be a number of >>> >>> elapsed units of time from a particular point in time. This unit >>> >>> choices would be the same as our unit for timestamps, so an interval >>> >>> can be viewed as a delta between two timestamps of some resolution >>> >>> (second through nanoseconds) [1]. >>> >>> >>> >>> As context, a number of systems I have worked with deal in absolute >>> >>> time deltas. In pandas, for example, the difference of timestamps >>> >>> (datetime64 values) is a timedelta: >>> >>> >>> >>> In [1]: import pandas as pd >>> >>> >>> >>> In [2]: dr1 = pd.date_range('1/1/2000', periods=5) >>> >>> >>> >>> In [3]: dr2 = pd.date_range('1/2/2000', periods=5) >>> >>> >>> >>> In [4]: dr1 - dr2 >>> >>> Out[4]: TimedeltaIndex(['-1 days', '-1 days', '-1 days', '-1 days', >>> >>> '-1 days'], dtype='timedelta64[ns]', freq=None) >>> >>> >>> >>> In [5]: (dr1 - dr2).values >>> >>> Out[5]: >>> >>> array([-86400000000000, -86400000000000, -86400000000000, >>> -86400000000000, >>> >>> -86400000000000], dtype='timedelta64[ns]') >>> >>> >>> >>> We need to be able to represent this data coherently (up to nanosecond >>> >>> resolution) with the Arrow metadata, and we will also at some point >>> >>> need to perform analytics directly on this data type. >>> >>> >>> >>> An alternative proposal to changing the DAY_TIME interval >>> >>> representation is to add another kind of interval type, so instead of >>> >>> only YEAR_MONTH and DAY_TIME, we have TIMEDELTA. The downside of this, >>> >>> of course, is the extra implementation complexity. DAY_TIME with the >>> >>> current Java representation also seems to me to be a subset of what >>> >>> you can represent with TIMEDELTA. >>> >>> >>> >>> It would be great to make a decision about this so we can get this >>> >>> metadata finalized in the 0.8.0 release. >>> >>> >>> >>> Thanks >>> >>> Wes >>> >>> >>> >>> [1]: https://github.com/apache/arrow/blob/master/format/ >>> Schema.fbs#L135 >>> >>> >>> >>>