Hi Team, Thank you all for the valuable feedback on the INTERVAL type during today’s community sync. I'd like to continue the discussion in this email thread.
The primary focus of the conversation is the proposed INTERVAL type's *compatibility with Apache Arrow*. Several key issues have been raised: 1. Naming of DayTimeInterval While the name DayTimeInterval closely follows the SQL standard and matches naming conventions used by most engines, some suggest that a name emphasizing precision—such as DayNanoInterval—might provide better clarity. 2. Mapping DayTimeInterval to Arrow's MonthDayNano Mapping DayTimeInterval to Arrow's MonthDayNano type is problematic due to semantic differences: a) MonthDayNano combines both calendar-based and duration-based components, whereas DayTimeInterval represents a pure duration. b) MonthDayNano allows mixed signs across components (e.g., positive months and negative days), which complicates comparison and evaluation. Given these differences, MonthDayNano is not a suitable candidate for representing DayTimeInterval. 3. Memory Footprint: Is 16 bytes necessary for DayTimeInterval? a) Some engines (e.g., Spark, Trino) represent DayTimeInterval using only 8 bytes, while others (like Oracle and Snowflake) support a wider range, potentially requiring more than 8 bytes. Additionally, there is interest in future support for higher precision, such as picoseconds, which would also demand a larger footprint. b) One proposal is to parameterize the size or precision, allowing engines to define their own representations. However, this approach introduces complexity and makes standardization difficult. A fixed-size format that provides enough range for most use cases is considered more robust. c) Several alternative strategies have been proposed: i) Use a 10-byte array, which is likely sufficient for all current engine requirements. ii) Use a 16-byte array now, with the option to evolve it into a standardized int128 in the future. iii) Start with an int64 representation, and plan for a future transition to int128, updating related types such as timestamps and intervals in parallel. Looking forward to hearing your thoughts on this! Link to the proposal: https://docs.google.com/document/d/12ghQxWxyAhSQeZyy0IWiwJ02gTqFOgfYm8x851HZFLk/edit?tab=t.0 Link to the PR: https://github.com/apache/parquet-format/pull/496/files Best Regards, Yun