Hi Team,

Thank you all for the valuable feedback on the INTERVAL type during today’s
community sync. I'd like to continue the discussion in this email thread.

The primary focus of the conversation is the proposed INTERVAL type's
*compatibility
with Apache Arrow*. Several key issues have been raised:
1. Naming of DayTimeInterval While the name DayTimeInterval closely follows
the SQL standard and matches naming conventions used by most engines, some
suggest that a name emphasizing precision—such as DayNanoInterval—might
provide better clarity.
2. Mapping DayTimeInterval to Arrow's MonthDayNano Mapping DayTimeInterval
to Arrow's MonthDayNano type is problematic due to semantic differences: a)
MonthDayNano combines both calendar-based and duration-based components,
whereas DayTimeInterval represents a pure duration. b) MonthDayNano allows
mixed signs across components (e.g., positive months and negative days),
which complicates comparison and evaluation.
Given these differences, MonthDayNano is not a suitable candidate for
representing DayTimeInterval.
3. Memory Footprint: Is 16 bytes necessary for DayTimeInterval? a) Some
engines (e.g., Spark, Trino) represent DayTimeInterval using only 8 bytes,
while others (like Oracle and Snowflake) support a wider range, potentially
requiring more than 8 bytes. Additionally, there is interest in future
support for higher precision, such as picoseconds, which would also demand
a larger footprint. b) One proposal is to parameterize the size or
precision, allowing engines to define their own representations. However,
this approach introduces complexity and makes standardization difficult. A
fixed-size format that provides enough range for most use cases is
considered more robust. c) Several alternative strategies have been
proposed: i) Use a 10-byte array, which is likely sufficient for all
current engine requirements. ii) Use a 16-byte array now, with the option
to evolve it into a standardized int128 in the future.
iii) Start with an int64 representation, and plan for a future transition
to int128, updating related types such as timestamps and intervals in
parallel.

Looking forward to hearing your thoughts on this!

Link to the proposal:
https://docs.google.com/document/d/12ghQxWxyAhSQeZyy0IWiwJ02gTqFOgfYm8x851HZFLk/edit?tab=t.0
Link to the PR: https://github.com/apache/parquet-format/pull/496/files

Best Regards,
Yun

Reply via email to