hi all, I'm working on type conversions between different systems, and the details of both the time and date data types raised some questions about their behaviour and a potential impact on interoperability:
*Question 1*: For my own understanding: what purpose does the millisecond date64 type serve? *Question 2* Relates to the definition and implementation of the date64 data type: The definition of date64 is from schema.fbs[1] is: *Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no leap seconds), where the values are evenly divisible by 86400000* However, In PyArrow I can create Date64 instances using integer input values that are not evenly divisible by 86400000 and the original input persists in the Arrow dataframe. That seems very counterintuitive and a potential cause for bugs in low level transformations and when moving data between systems with Arrow. Shouldn't (Py)Arrow either reject the input, or convert it when explicitly asked to? >>> pa.scalar(86499999, pa.date64()) <pyarrow.Date64Scalar: datetime.date(1970, 1, 2)> >>> pa.scalar(86499999, pa.date64()).cast(pa.int64()) <pyarrow.Int64Scalar: 86499999> *Question 3*: both the time32 and time64 time-of-day types, in either precision, accept and store integer input that falls outside of the 24-hour window. Like the issue raised about the date64 type, this seems like unexpected behavior, possibly even impacting interoperability. I expected the boundaries of these values to be enforced. What's the desirable behaviour from the Arrow specification perspective? Is it the current behaviour, or should the input either be rejected or explicitly converted? See: >>> pa.scalar(-1,pa.time32('s')) # expected: exception or warning <pyarrow.Time32Scalar: datetime.time(23, 59, 59)> >>> pa.scalar(-1,pa.time32('s')).cast(pa.int32()) # expected: 86399 <pyarrow.Int32Scalar: -1> >>> pa.scalar(86400,pa.time32('s')) # expected: exception or warning <pyarrow.Time32Scalar: datetime.time(0, 0)> >>> pa.scalar(86400,pa.time32('s')).cast(pa.int32()) # expected: 0 <pyarrow.Int32Scalar: 86400> I'm looking for answers to understand the intended behaviour. If question 2 and 3 are actually issues with the implementations, let me know and I'll raise them on Github (or Jira if that's where they belong). Thanks, Marnix van den Broek Data Engineer at bundlesandbatches.io [1] https://github.com/apache/arrow/blob/4591d76fce2846a29dac33bf01e9ba0337b118e9/format/Schema.fbs#L200-L201