> I don't think it is critical to add seconds. But I do think we should consider > current engine compatibility. Supporting at least microseconds would add > value as I think most existing OSS engines support this range (engines > would not need to scale to/from their native resolution).
Make sense. As for implementation, I think we can start with nano-seconds to understand the complexity first. > I think starting with int64 still makes sense here. It will likely satisfy > most use-cases, and in general will be easier for existing OSS engines to > integrate with. Sure, I think we can definitely start with int64. Based on the above discussion, it appears that we're making a small adjustment to the plan: 1. Instead of introducing NanoDuration, we will introduce a more general Duration type that takes TimeUnit as a parameter and annotates int64, starting with support for nanoseconds. 2. The YearMonthInterval type remains the same, and it annotates int32. We will continue evaluating int128 versus FLAB(16), and once a decision is made, we will proceed with expanding the range of the existing time-related types. Best Regards, Yun On Fri, Jul 11, 2025 at 12:22 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Yun, > > After thinking about it some, here are my thoughts. > > > > Introducing parameterization would definitely add complexity and increase > > the number of logical types. Also, Parquet currently doesn’t support > > seconds > > as a time unit. Given that, it might be more practical to start with a > > concrete type > > like NanoDuration, and add other *Unit*Duration types later if needed. > > > I think the main issue being raised is this would make it inconsistent with > Timestamp logical annotations. I don't think it is critical to add > seconds. But I do think we should consider current engine compatibility. > Supporting at least microseconds would add value as I think most existing > OSS engines support this range (engines would not need to scale to/from > their native resolution). We can defer making a final decision on what > resolutions will actually need to be supported until we look at the > additional complexity in implementations. > > > f that’s the direction, should we define NanoDuration directly using the > > widest size—FLAB(16)—or > > annotate support for both int64 and FLAB(16)? My inclination is to go > with > > the wider type from the start > > to keep things simpler, but I’d love to hear others’ thoughts on this. > > > I think starting with int64 still makes sense here. It will likely satisfy > most use-cases, and in general will be easier for existing OSS engines to > integrate with. I responded on the FLBA vs int128 thread but I think we > still need to do some code exploration to understand the true trade-offs > here. I think decoupling from the wider type discussion would allow us to > get this done sooner. > > Cheers, > Micah > > > > > On Fri, Jul 11, 2025 at 11:44 AM yun zou <yunzou.colost...@gmail.com> > wrote: > > > > The Arrow format supports 64-bit durations of seconds, milliseconds, > > > microseconds and nanoseconds. It would make sense for Parquet to > > > roundtrip these types IMHO. > > > > Introducing parameterization would definitely add complexity and increase > > the number of logical types. Also, Parquet currently doesn’t support > > seconds > > as a time unit. Given that, it might be more practical to start with a > > concrete type > > like NanoDuration, and add other *Unit*Duration types later if needed. > > > > Following up on the related discussion about int128 vs FLAB(16)(see: > > https://lists.apache.org/thread/7zfwc3o53btd2xbdb8bqf8lxsrk76cxr), > > it seems there’s a general preference for sticking with FLAB(16) rather > > than introducing > > a new int128 type. > > > > If that’s the direction, should we define NanoDuration directly using the > > widest size—FLAB(16)—or > > annotate support for both int64 and FLAB(16)? My inclination is to go > with > > the wider type from the start > > to keep things simpler, but I’d love to hear others’ thoughts on this. > > > > Best Regards, > > Yun Zou > > > > On Fri, Jul 11, 2025 at 9:23 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > On Thu, 10 Jul 2025 17:18:34 -0700 > > > yun zou <yunzou.colost...@gmail.com> > > > wrote: > > > > > I think the point was raised previously that hard-coded names were > > > > > preferred but I don't recall if that was when we were still calling > > > this > > > > > DayTime? > > > > > > > > I believe the main concern around naming is focused on whether to > > > > use *"Duration"* or *"Interval"*, rather than the inclusion of > *"Nano"* > > > in > > > > the type name. > > > > > > > > As for the parameterized time unit, the primary issue seems to be > that > > > > the *physical type size would vary depending on the unit* — for > > example, > > > > using int32 for milliseconds and int64 for microseconds. However, it > > > sounds > > > > like the proposal is to use the same physical type, the unit is just > > > used to > > > > indicate the type name. > > > > > > Certainly the latter, IMHO. > > > > > > > > I do think it's reasonable to parameterize `TimeUnit` > > > > > for consistency and future proofing but for now we should say it > only > > > > > supports Nanoseconds > > > > > > > > It feels a bit odd to introduce a parameter when it currently only > > > supports > > > > a single value. > > > > An alternative could be to start with a concrete type like > > > *NanoDuration*, > > > > and if future > > > > requirements arise, we can consider adding new logical types such as > > > > *MicroDuration*, etc. > > > > The disadvantage is that the number of logical type definitions will > > > > increase along with the units > > > > we want to support, but I doubt there will be a lot. > > > > > > The Arrow format supports 64-bit durations of seconds, milliseconds, > > > microseconds and nanoseconds. It would make sense for Parquet to > > > roundtrip these types IMHO. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > > > >