> > However, the reverse is not guaranteed: a MonthDayNano value cannot > reliably be converted back > into a DayTimeInterval. This is because there's no way to determine whether > the calendar component > is used without looking into the data, which introduces ambiguity. This > ambiguity can negatively impact > interoperability across different engines and systems.
Ultimately, this is something that systems will need to deal with at some point but this can delayed until someone has the bandwidth to have a formal proposal for persisting MonthDayNano in parquet (and it would still be up to the consuming system on how to do the translation so I'm not clear that defining the translation is strictly necessary). > Regarding whether we should use FLBA(16) or INT128, while INT128 does have > a natural > fitting for ordering, I think one concern I had is if that type will only > be used by the Day Time Interval. I think there are a few use-cases that have at least been mentioned where it would be useful to have int128: 1. A replacement for int96 timestamp that can handle the full range of ANSI SQL Nanoseconds. 2. Picoseconds has at least been mentioned in passing and that would require int128. If we don't model it as a 128 we should minimize the range to reflect what ANSI SQL requires (i.e. FLBA(10) I believe). We should probably allow the logical type to annotate both int64 and FLBA(10), since int64 is a common representation for nanoseconds (this is similar to what we already do for Decimal values). Regarding the name for DayTimeInterval, if we all agree that "Duration" > provides better clarity, > I'm fully on board with using that instead. +1, IIUC I think this addresses the majority of concerns. If others in the community want to define a parquet representation for MonthDayNanos arrow interval that would be welcome as well. I think the main question then becomes on Arrow side if we want to define the new type or deal with the unlikely case of overflow for the duration type. On Mon, Jul 7, 2025 at 4:38 PM yun zou <yunzou.colost...@gmail.com> wrote: > Hi, > > Thanks all for the valuable feedback! > > Regarding the MonthDayNano type, one important point that may not be > explicitly stated > is the lack of true interoperability between YearMonthInterval, > DayTimeInterval, and MonthDayNano. > > While YearMonthInterval and DayTimeInterval are not directly interoperable > with each other, > they can both be converted into MonthDayNano by setting certain components > to zero. > However, the reverse is not guaranteed: a MonthDayNano value cannot > reliably be converted back > into a DayTimeInterval. This is because there's no way to determine whether > the calendar component > is used without looking into the data, which introduces ambiguity. This > ambiguity can negatively impact > interoperability across different engines and systems. > > > Doesn't capture semantics for engines that treat day as a calendar type. > I don't actually see the above as a drawback of introducing two separate > interval types, > since when the day is used as a calendar type, it can be mapped to the > MonthDayNano type. > In fact, I believe all three types are necessary to fully support the range > of use cases. > What’s important is that we clearly define the interoperability rules > between them to ensure > consistent behavior across systems. > > > While I understand the desire to be able to represent all values > > allowable in ANSI SQL, I really don't understand why our types should > > not be allowed to represent any values *outside* of the range allowed > > in ANSI SQL. > I completely agree—if there are valid use cases beyond ANSI SQL, we should > absolutely support them. It makes sense to leave range validation to the > engine or > client implementation, as they are best suited to handle their own specific > requirements.. > > Regarding whether we should use FLBA(16) or INT128, while INT128 does have > a natural > fitting for ordering, I think one concern I had is if that type will only > be used by the Day Time Interval. > > Regarding the name for DayTimeInterval, if we all agree that "Duration" > provides better clarity, > I'm fully on board with using that instead. > > Best Regards, > Yun >