Hi Yun, After thinking about it some, here are my thoughts.
> Introducing parameterization would definitely add complexity and increase > the number of logical types. Also, Parquet currently doesn’t support > seconds > as a time unit. Given that, it might be more practical to start with a > concrete type > like NanoDuration, and add other *Unit*Duration types later if needed. I think the main issue being raised is this would make it inconsistent with Timestamp logical annotations. I don't think it is critical to add seconds. But I do think we should consider current engine compatibility. Supporting at least microseconds would add value as I think most existing OSS engines support this range (engines would not need to scale to/from their native resolution). We can defer making a final decision on what resolutions will actually need to be supported until we look at the additional complexity in implementations. f that’s the direction, should we define NanoDuration directly using the > widest size—FLAB(16)—or > annotate support for both int64 and FLAB(16)? My inclination is to go with > the wider type from the start > to keep things simpler, but I’d love to hear others’ thoughts on this. I think starting with int64 still makes sense here. It will likely satisfy most use-cases, and in general will be easier for existing OSS engines to integrate with. I responded on the FLBA vs int128 thread but I think we still need to do some code exploration to understand the true trade-offs here. I think decoupling from the wider type discussion would allow us to get this done sooner. Cheers, Micah On Fri, Jul 11, 2025 at 11:44 AM yun zou <yunzou.colost...@gmail.com> wrote: > > The Arrow format supports 64-bit durations of seconds, milliseconds, > > microseconds and nanoseconds. It would make sense for Parquet to > > roundtrip these types IMHO. > > Introducing parameterization would definitely add complexity and increase > the number of logical types. Also, Parquet currently doesn’t support > seconds > as a time unit. Given that, it might be more practical to start with a > concrete type > like NanoDuration, and add other *Unit*Duration types later if needed. > > Following up on the related discussion about int128 vs FLAB(16)(see: > https://lists.apache.org/thread/7zfwc3o53btd2xbdb8bqf8lxsrk76cxr), > it seems there’s a general preference for sticking with FLAB(16) rather > than introducing > a new int128 type. > > If that’s the direction, should we define NanoDuration directly using the > widest size—FLAB(16)—or > annotate support for both int64 and FLAB(16)? My inclination is to go with > the wider type from the start > to keep things simpler, but I’d love to hear others’ thoughts on this. > > Best Regards, > Yun Zou > > On Fri, Jul 11, 2025 at 9:23 AM Antoine Pitrou <anto...@python.org> wrote: > > > On Thu, 10 Jul 2025 17:18:34 -0700 > > yun zou <yunzou.colost...@gmail.com> > > wrote: > > > > I think the point was raised previously that hard-coded names were > > > > preferred but I don't recall if that was when we were still calling > > this > > > > DayTime? > > > > > > I believe the main concern around naming is focused on whether to > > > use *"Duration"* or *"Interval"*, rather than the inclusion of *"Nano"* > > in > > > the type name. > > > > > > As for the parameterized time unit, the primary issue seems to be that > > > the *physical type size would vary depending on the unit* — for > example, > > > using int32 for milliseconds and int64 for microseconds. However, it > > sounds > > > like the proposal is to use the same physical type, the unit is just > > used to > > > indicate the type name. > > > > Certainly the latter, IMHO. > > > > > > I do think it's reasonable to parameterize `TimeUnit` > > > > for consistency and future proofing but for now we should say it only > > > > supports Nanoseconds > > > > > > It feels a bit odd to introduce a parameter when it currently only > > supports > > > a single value. > > > An alternative could be to start with a concrete type like > > *NanoDuration*, > > > and if future > > > requirements arise, we can consider adding new logical types such as > > > *MicroDuration*, etc. > > > The disadvantage is that the number of logical type definitions will > > > increase along with the units > > > we want to support, but I doubt there will be a lot. > > > > The Arrow format supports 64-bit durations of seconds, milliseconds, > > microseconds and nanoseconds. It would make sense for Parquet to > > roundtrip these types IMHO. > > > > Regards > > > > Antoine. > > > > > > >