>
> I'm in favor of the CalendarDuration and TimeDuration

I'm bikeshedding now but:

YearMonthInterval I think works for the first type, the language and type
lines up with ANSI SQL and an Arrow type so I think there is little
ambiguity.

The second type, I think, can just be called Duration and be parameterized
with a single enum that contains nanoseconds (which allows it to expand to
support other granularities if needed).  Thoughts?  IIUC based on all the
discussion Day Time Interval as proposed aligns with Arrow's definition
<https://github.com/apache/arrow/blob/main/format/Schema.fbs#L423> [1] of
Duration?

I don't have a problem with FLBA(10) but I
> would hope
> we could do some better encoding tricks with an Int128


Yes, hopefully we can get some better integer encodings in place that
would apply across the board.

Cheers,
Micah

[1] https://github.com/apache/arrow/blob/main/format/Schema.fbs#L423

On Tue, Jul 8, 2025 at 9:28 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I'm in favor of the CalendarDuration and TimeDuration types as better names
> for what we are trying to express here. I also think going forward with
> Int64 for
> now probably makes sense with us also doing some work to start getting an
> official int128 in as well. I don't have a problem with FLBA(10) but I
> would hope
> we could do some better encoding tricks with an Int128. I'm relatively a
> novice
> in this area so take that with a grain of salt.
>
> On Mon, Jul 7, 2025 at 10:39 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > >
> > > However, the reverse is not guaranteed: a MonthDayNano value cannot
> > > reliably be converted back
> > > into a DayTimeInterval. This is because there's no way to determine
> > whether
> > > the calendar component
> > > is used without looking into the data, which introduces ambiguity. This
> > > ambiguity can negatively impact
> > > interoperability across different engines and systems.
> >
> >
> > Ultimately, this is something that systems will need to deal with at some
> > point but this can delayed until someone has the bandwidth to have a
> formal
> > proposal for persisting MonthDayNano in parquet (and it would still be up
> > to the consuming system on how to do the translation so I'm not clear
> that
> > defining the translation is strictly necessary).
> >
> >
> > > Regarding whether we should use FLBA(16) or INT128, while INT128 does
> > have
> > > a natural
> > > fitting for ordering, I think one concern I had is if that type will
> only
> > > be used by the Day Time Interval.
> >
> >
> > I think there are a few use-cases that have at least been mentioned where
> > it would be useful to have int128:
> >
> > 1.  A replacement for int96 timestamp that can handle the full range of
> > ANSI SQL Nanoseconds.
> > 2.  Picoseconds has at least been mentioned in passing and that would
> > require int128.
> >
> > If we don't model it as a 128 we should minimize the range to reflect
> what
> > ANSI SQL requires (i.e. FLBA(10) I believe). We should probably allow the
> > logical type to annotate both int64 and FLBA(10), since int64 is a common
> > representation for nanoseconds (this is similar to what we already do for
> > Decimal values).
> >
> > Regarding the name for DayTimeInterval, if we all agree that "Duration"
> > > provides better clarity,
> > > I'm fully on board with using that instead.
> >
> >
> > +1, IIUC I think this addresses the majority of concerns.  If others in
> the
> > community want to define a parquet representation for MonthDayNanos arrow
> > interval that would be welcome as well. I think the main question then
> > becomes on Arrow side if we want to define the new type or deal with the
> > unlikely case of overflow for the duration type.
> >
> >
> >
> > On Mon, Jul 7, 2025 at 4:38 PM yun zou <yunzou.colost...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > Thanks all for the valuable feedback!
> > >
> > > Regarding the MonthDayNano type, one important point that may not be
> > > explicitly stated
> > > is the lack of true interoperability between YearMonthInterval,
> > > DayTimeInterval, and MonthDayNano.
> > >
> > > While YearMonthInterval and DayTimeInterval are not directly
> > interoperable
> > > with each other,
> > > they can both be converted into MonthDayNano by setting certain
> > components
> > > to zero.
> > > However, the reverse is not guaranteed: a MonthDayNano value cannot
> > > reliably be converted back
> > > into a DayTimeInterval. This is because there's no way to determine
> > whether
> > > the calendar component
> > > is used without looking into the data, which introduces ambiguity. This
> > > ambiguity can negatively impact
> > > interoperability across different engines and systems.
> > >
> > > > Doesn't capture semantics for engines that treat day as a calendar
> > type.
> > > I don't actually see the above as a drawback of introducing two
> separate
> > > interval types,
> > > since when the day is used as a calendar type, it can be mapped to the
> > > MonthDayNano type.
> > > In fact, I believe all three types are necessary to fully support the
> > range
> > > of use cases.
> > > What’s important is that we clearly define the interoperability rules
> > > between them to ensure
> > > consistent behavior across systems.
> > >
> > > > While I understand the desire to be able to represent all values
> > > > allowable in ANSI SQL, I really don't understand why our types should
> > > > not be allowed to represent any values *outside* of the range allowed
> > > > in ANSI SQL.
> > > I completely agree—if there are valid use cases beyond ANSI SQL, we
> > should
> > > absolutely support them. It makes sense to leave range validation to
> the
> > > engine or
> > > client implementation, as they are best suited to handle their own
> > specific
> > > requirements..
> > >
> > > Regarding whether we should use FLBA(16) or INT128, while INT128 does
> > have
> > > a natural
> > > fitting for ordering, I think one concern I had is if that type will
> only
> > > be used by the Day Time Interval.
> > >
> > > Regarding the name for DayTimeInterval, if we all agree that "Duration"
> > > provides better clarity,
> > > I'm fully on board with using that instead.
> > >
> > > Best Regards,
> > > Yun
> > >
> >
>

Reply via email to