> I don't think it is critical to add seconds. But I do think we should
consider
> current engine compatibility. Supporting at least microseconds would add
> value as I think most existing OSS engines support this range (engines
> would not need to scale to/from their native resolution).

Make sense. As for implementation, I think we can start with nano-seconds to
understand the complexity first.

> I think starting with int64 still makes sense here.  It will likely
satisfy
> most use-cases, and in general will be easier for existing OSS engines to
> integrate with.

Sure, I think we can definitely start with int64.

Based on the above discussion, it appears that we're making a small
adjustment to the plan:
1. Instead of introducing NanoDuration, we will introduce a more general
Duration type
    that takes TimeUnit as a parameter and annotates int64, starting with
support for nanoseconds.
2. The YearMonthInterval type remains the same, and it annotates int32.

We will continue evaluating int128 versus FLAB(16), and once a decision is
made,
we will proceed with expanding the range of the existing time-related types.

Best Regards,
Yun

On Fri, Jul 11, 2025 at 12:22 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Yun,
>
> After thinking about it some, here are my thoughts.
>
>
> > Introducing parameterization would definitely add complexity and increase
> > the number of logical types. Also, Parquet currently doesn’t support
> > seconds
> > as a time unit.  Given that, it might be more practical to start with a
> > concrete type
> > like NanoDuration, and add other *Unit*Duration types later if needed.
>
>
> I think the main issue being raised is this would make it inconsistent with
> Timestamp logical annotations.  I don't think it is critical to add
> seconds. But I do think we should consider current engine compatibility.
> Supporting at least microseconds would add value as I think most existing
> OSS engines support this range (engines would not need to scale to/from
> their native resolution).  We can defer making a final decision on what
> resolutions will actually need to be supported until we look at the
> additional complexity in implementations.
>
>
> f that’s the direction, should we define NanoDuration directly using the
> > widest size—FLAB(16)—or
> > annotate support for both int64 and FLAB(16)? My inclination is to go
> with
> > the wider type from the start
> > to keep things simpler, but I’d love to hear others’ thoughts on this.
>
>
> I think starting with int64 still makes sense here.  It will likely satisfy
> most use-cases, and in general will be easier for existing OSS engines to
> integrate with. I responded on the FLBA vs int128 thread but I think we
> still need to do some code exploration to understand the true trade-offs
> here.  I think decoupling from the wider type discussion would allow us to
> get this done sooner.
>
> Cheers,
> Micah
>
>
>
>
> On Fri, Jul 11, 2025 at 11:44 AM yun zou <yunzou.colost...@gmail.com>
> wrote:
>
> > > The Arrow format supports 64-bit durations of seconds, milliseconds,
> > > microseconds and nanoseconds. It would make sense for Parquet to
> > > roundtrip these types IMHO.
> >
> > Introducing parameterization would definitely add complexity and increase
> > the number of logical types. Also, Parquet currently doesn’t support
> > seconds
> > as a time unit.  Given that, it might be more practical to start with a
> > concrete type
> > like NanoDuration, and add other *Unit*Duration types later if needed.
> >
> > Following up on the related discussion about int128 vs FLAB(16)(see:
> > https://lists.apache.org/thread/7zfwc3o53btd2xbdb8bqf8lxsrk76cxr),
> > it seems there’s a general preference for sticking with FLAB(16) rather
> > than introducing
> > a new int128 type.
> >
> > If that’s the direction, should we define NanoDuration directly using the
> > widest size—FLAB(16)—or
> > annotate support for both int64 and FLAB(16)? My inclination is to go
> with
> > the wider type from the start
> > to keep things simpler, but I’d love to hear others’ thoughts on this.
> >
> > Best Regards,
> > Yun Zou
> >
> > On Fri, Jul 11, 2025 at 9:23 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> > > On Thu, 10 Jul 2025 17:18:34 -0700
> > > yun zou <yunzou.colost...@gmail.com>
> > > wrote:
> > > > > I think the point was raised previously that hard-coded names were
> > > > > preferred but I don't recall if that was when we were still calling
> > > this
> > > > > DayTime?
> > > >
> > > > I believe the main concern around naming is focused on whether to
> > > > use *"Duration"* or *"Interval"*, rather than the inclusion of
> *"Nano"*
> > > in
> > > > the type name.
> > > >
> > > > As for the parameterized time unit, the primary issue seems to be
> that
> > > > the *physical type size would vary depending on the unit* — for
> > example,
> > > > using int32 for milliseconds and int64 for microseconds. However, it
> > > sounds
> > > > like the proposal is to use the same physical type, the unit is just
> > > used to
> > > > indicate the type name.
> > >
> > > Certainly the latter, IMHO.
> > >
> > > > > I do think it's reasonable to parameterize `TimeUnit`
> > > > > for consistency and future proofing but for now we should say it
> only
> > > > > supports Nanoseconds
> > > >
> > > > It feels a bit odd to introduce a parameter when it currently only
> > > supports
> > > > a single value.
> > > > An alternative could be to start with a concrete type like
> > > *NanoDuration*,
> > > > and if future
> > > > requirements arise, we can consider adding new logical types such as
> > > > *MicroDuration*, etc.
> > > > The disadvantage is that the number of logical type definitions will
> > > > increase along with the units
> > > > we want to support, but I doubt there will be a lot.
> > >
> > > The Arrow format supports 64-bit durations of seconds, milliseconds,
> > > microseconds and nanoseconds. It would make sense for Parquet to
> > > roundtrip these types IMHO.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >
> >
>

Reply via email to