Some experiments inspired by an SO post[1] led me to question the
meaning of time.  The main question is **what happens when the value
exceeds 24 hours?**.

 A) One potential interpretation is that these are invalid but neither
the C++ implementation or pyarrow reject these today.  Nor do they
correct them.
 B) An alternative interpretation is to modulo by UTC days (e.g., if
seconds, 86400) and use the resulting value.

The (B) approach makes conversion from timestamp -> time trivial (just
a metadata change).  I think this is the correct, and preferred,
interpretation.  However, it would require all implementations to
interpret time in this way.  With that in mind, if we think this is
the correct approach, I'd like to clean up the docs.

Time32/Time64 are rather sparsely documented in schema.fbs[2]

> /// - SECOND and MILLISECOND: 32 bits
> /// - MICROSECOND and NANOSECOND: 64 bits
> /// Time type. The physical storage type depends on the unit

I'm assuming they are based on Parquet's time type[3] which is
slightly more documented (note, I am including this for illustration,
I am not recommending any change to Parquet):

> TIME
>
> TIME is used for a logical time type without a date with millisecond or
> microsecond precision. The type has two type parameters: UTC adjustment
> (true or false) and unit (MILLIS or MICROS, NANOS).
>
> TIME with unit MILLIS is used for millisecond precision. It must annotate an
> int32 that stores the number of milliseconds after midnight.
>
> TIME with unit MICROS is used for microsecond precision. It must annotate an
> int64 that stores the number of microseconds after midnight.
>
> TIME with unit NANOS is used for nanosecond precision. It must annotate an
> int64 that stores the number of nanoseconds after midnight.
>
> The sort order used for TIME is signed.

The C++ docs have[4]...

> Concrete type class for 32-bit time data (as number of seconds or
> milliseconds since midnight)
> Concrete type class for 64-bit time data (as number of microseconds or
> nanoseconds since midnight)

[1] 
https://stackoverflow.com/questions/68766837/how-to-cast-pyarrow-timestamp-dtype-to-time64-type/68767194?noredirect=1#comment121553900_68767194
[2] 
https://github.com/apache/arrow/blob/4591d76fce2846a29dac33bf01e9ba0337b118e9/format/Schema.fbs#L209-L215
[3] https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time
[4] 
https://arrow.apache.org/docs/cpp/api/datatype.html?highlight=time#time-related

Reply via email to