Sgtm, I think a PMC member needs to kick it off?

On Wednesday, April 3, 2019, Wes McKinney <wesmck...@gmail.com> wrote:

> Agreed
>
> On Wed, Apr 3, 2019 at 9:53 AM Jacques Nadeau <jacq...@apache.org> wrote:
> >
> > Option 1 sounds good to me. Let's take to a vote.
> >
> > On Tue, Apr 2, 2019 at 8:53 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >>
> >> Based on the discussion so far, my attempt at concrete Schema proposals
> >> below.    Jacques I think summarizes what we've discussed, apologies if
> >> I've misunderstood.  Wes would Option 1 work to support the Pandas Time
> >> Delta use-case?  I'm leaning towards Option 1 if it satisfies everyone
> (but
> >> happy to implement whatever we come to a consensus on).
> >>
> >> ** Option 1:  New Type: **
> >> /// An absolute length of time unrelated to any calendar artifacts.  For
> >> the purposes
> >> /// of Arrow Implementations, adding this value to a Timestamp ("t1")
> >> naively (i.e. simply summing
> >> /// the two number) is acceptable even though in some cases the
> resulting
> >> Timestamp (t2) would
> >> /// not account for leap-seconds during the elapsed time between "t1"
> and
> >> "t2".  Similarly, representing
> >> /// the difference between two Unix timestamp is acceptable, but would
> >> yield a value that is possibly a few seconds
> >> /// off from the true elapsed time.
> >> ///
> >> ///  The resolution defaults to
> >> /// millisecond, but can be any of the other supported TimeUnit values
> as
> >> /// with Timestamp and Time types.  This type is always represented as
> >> /// an 8-byte integer.
> >> table DurationInterval {
> >>    unit: TimeUnit = MILLISECOND;
> >> }
> >>
> >> ** Option 2: New TimeDelta enum on Interval Unit (strong definition
> around
> >> leap-seconds): **
> >>
> >> enum IntervalUnit: short { YEAR_MONTH, DAY_TIME, TIME_DELTA}
> >> // A "calendar" interval which models types that don't necessarily
> >> // have a precise duration without the context of a base timestamp (e.g.
> >> // days can differ in length during day light savings time transitions).
> >> In the case
> >> // of TimeDelta it is possible no precise definition is possible if the
> >> base timestamp occurs
> >> // at an instant when a leap second was added (but would only differ by
> at
> >> most 1 second).
> >> // YEAR_MONTH - Indicates the number of elapsed whole months, stored as
> >> //   4-byte integers.
> >> // DAY_TIME - Indicates the number of elapsed days and milliseconds,
> >> //   stored as 2 contiguous 32-bit integers (8-bytes in total).  Support
> >> //   of this IntervalUnit is not required for full arrow compatibility.
> >> // TIME_DELTA - Indicates absolute time difference between Unix
> Timstamps
> >> (i.e. excluding leap seconds).  This value is always represented as an
> >> 8-byte integer.
> >> table Interval {
> >>   unit: IntervalUnit;
> >>   resolution: TimeUnit  // Only relevant for TIME_DELTA
> >> }
> >>
> >> On Tue, Apr 2, 2019 at 10:03 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >>
> >> > Since there were some mentions of leap seconds:
> >> >
> >> > I think the intent of the timedelta/duration type should be to express
> >> > the difference between UNIX timestamps (from second to nanosecond
> >> > resolution), which don't include leap seconds. We use the
> >> > timedelta64[ns] type in pandas for example, which is a
> >> > nanosecond-resolution difference of UNIX timestamps.
> >> >
> >> > On Tue, Apr 2, 2019 at 10:05 AM Jacques Nadeau <jacq...@apache.org>
> wrote:
> >> > >
> >> > > >
> >> > > > I could go either way, it has some benefits for forward
> compatibility I
> >> > > > suppose, but on the other hand YAGNI, if you feel strongly, I'm ok
> >> > > > including it.  However, the more optional fields we have for a
> specific
> >> > > > enum value, makes me lean more towards a new type instead of just
> an
> >> > enum.
> >> > > >
> >> > > I'm okay with skipping for now. Appreciate the focus on only what we
> >> > > actually need.
> >> > >
> >> > >
> >> > >
> >> > > > Could you elaborate on defining standard arithmetic conversions
> between
> >> > > > time-delta/duration in seconds and other time unit (days, months,
> >> > years) as
> >> > > > part of the standard/format, I'm still not sure I understand what
> the
> >> > > > use-case is here.
> >> > > >
> >> > >
> >> > > Here goes nothing...
> >> > >
> >> > > Seems like there are two options for durations:
> >> > > 1) they aren't related to any other type
> >> > > 2) they have a relationship to timestamps and dates.
> >> > >
> >> > > If 1, then the only thing I could understand is real world duration
> how
> >> > > seconds are defined (and fractions thereof). E.g. [1] :D. In this
> >> > > situation, there is no way to express any unit of time of higher
> >> > > granularity than a second (e.g. days) since it is up to application
> >> > > implementer to define the relationship. This severely limits the
> >> > > expressiveness of the concept. (I can't ever use something
> TimeUnit.DAYS)
> >> > > and stops the ability to cover the existing interval YEAR_MONTH
> type I
> >> > > believe (since it has a resolution of months).
> >> > >
> >> > > If 2, then we must define the canonical value of ts + duration,
> otherwise
> >> > > duration are somewhat meaningless, thus the proposed translation
> chart
> >> > > (which causes its own oddities depending on the resolution of the
> time
> >> > type
> >> > > you are adding to).
> >> > >
> >> > > That being said, having started to remember previous discussions on
> this,
> >> > > I'm most inclined to simply pick #1 and ignore the need for anything
> >> > more.
> >> > > The curiousness of interval math in database systems underscores
> the fact
> >> > > that it apparently doesn't matter that much. In most cases, today +
> 3
> >> > > months is close enough to today + 90 days for government work.
> >> > >
> >> > > Let's +2 a patch and get it merged quickly so we never have to think
> >> > about
> >> > > this again :)
> >> > >
> >> > > [1]  "the duration of 9,192,631,770 periods
> >> > > <https://en.wikipedia.org/wiki/Frequency> of the radiation
> >> > corresponding to
> >> > > the transition between the two hyperfine levels
> >> > > <https://en.wikipedia.org/wiki/Hyperfine_structure> of the ground
> state
> >> > of
> >> > > the caesium-133 <https://en.wikipedia.org/wiki/Caesium-133> atom"
> (at a
> >> > > temperature of 0 K <https://en.wikipedia.org/wiki/Absolute_zero>)
> >> > >
> >> > > >
> >> >
>

Reply via email to