Hi Yun,

After thinking about it some, here are my thoughts.


> Introducing parameterization would definitely add complexity and increase
> the number of logical types. Also, Parquet currently doesn’t support
> seconds
> as a time unit.  Given that, it might be more practical to start with a
> concrete type
> like NanoDuration, and add other *Unit*Duration types later if needed.


I think the main issue being raised is this would make it inconsistent with
Timestamp logical annotations.  I don't think it is critical to add
seconds. But I do think we should consider current engine compatibility.
Supporting at least microseconds would add value as I think most existing
OSS engines support this range (engines would not need to scale to/from
their native resolution).  We can defer making a final decision on what
resolutions will actually need to be supported until we look at the
additional complexity in implementations.


f that’s the direction, should we define NanoDuration directly using the
> widest size—FLAB(16)—or
> annotate support for both int64 and FLAB(16)? My inclination is to go with
> the wider type from the start
> to keep things simpler, but I’d love to hear others’ thoughts on this.


I think starting with int64 still makes sense here.  It will likely satisfy
most use-cases, and in general will be easier for existing OSS engines to
integrate with. I responded on the FLBA vs int128 thread but I think we
still need to do some code exploration to understand the true trade-offs
here.  I think decoupling from the wider type discussion would allow us to
get this done sooner.

Cheers,
Micah




On Fri, Jul 11, 2025 at 11:44 AM yun zou <yunzou.colost...@gmail.com> wrote:

> > The Arrow format supports 64-bit durations of seconds, milliseconds,
> > microseconds and nanoseconds. It would make sense for Parquet to
> > roundtrip these types IMHO.
>
> Introducing parameterization would definitely add complexity and increase
> the number of logical types. Also, Parquet currently doesn’t support
> seconds
> as a time unit.  Given that, it might be more practical to start with a
> concrete type
> like NanoDuration, and add other *Unit*Duration types later if needed.
>
> Following up on the related discussion about int128 vs FLAB(16)(see:
> https://lists.apache.org/thread/7zfwc3o53btd2xbdb8bqf8lxsrk76cxr),
> it seems there’s a general preference for sticking with FLAB(16) rather
> than introducing
> a new int128 type.
>
> If that’s the direction, should we define NanoDuration directly using the
> widest size—FLAB(16)—or
> annotate support for both int64 and FLAB(16)? My inclination is to go with
> the wider type from the start
> to keep things simpler, but I’d love to hear others’ thoughts on this.
>
> Best Regards,
> Yun Zou
>
> On Fri, Jul 11, 2025 at 9:23 AM Antoine Pitrou <anto...@python.org> wrote:
>
> > On Thu, 10 Jul 2025 17:18:34 -0700
> > yun zou <yunzou.colost...@gmail.com>
> > wrote:
> > > > I think the point was raised previously that hard-coded names were
> > > > preferred but I don't recall if that was when we were still calling
> > this
> > > > DayTime?
> > >
> > > I believe the main concern around naming is focused on whether to
> > > use *"Duration"* or *"Interval"*, rather than the inclusion of *"Nano"*
> > in
> > > the type name.
> > >
> > > As for the parameterized time unit, the primary issue seems to be that
> > > the *physical type size would vary depending on the unit* — for
> example,
> > > using int32 for milliseconds and int64 for microseconds. However, it
> > sounds
> > > like the proposal is to use the same physical type, the unit is just
> > used to
> > > indicate the type name.
> >
> > Certainly the latter, IMHO.
> >
> > > > I do think it's reasonable to parameterize `TimeUnit`
> > > > for consistency and future proofing but for now we should say it only
> > > > supports Nanoseconds
> > >
> > > It feels a bit odd to introduce a parameter when it currently only
> > supports
> > > a single value.
> > > An alternative could be to start with a concrete type like
> > *NanoDuration*,
> > > and if future
> > > requirements arise, we can consider adding new logical types such as
> > > *MicroDuration*, etc.
> > > The disadvantage is that the number of logical type definitions will
> > > increase along with the units
> > > we want to support, but I doubt there will be a lot.
> >
> > The Arrow format supports 64-bit durations of seconds, milliseconds,
> > microseconds and nanoseconds. It would make sense for Parquet to
> > roundtrip these types IMHO.
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
>

Reply via email to