Le 16/08/2021 à 20:52, Weston Pace a écrit :
Some experiments inspired by an SO post[1] led me to question the meaning of time. The main question is **what happens when the value exceeds 24 hours?**. A) One potential interpretation is that these are invalid but neither the C++ implementation or pyarrow reject these today. Nor do they correct them. B) An alternative interpretation is to modulo by UTC days (e.g., if seconds, 86400) and use the resulting value. The (B) approach makes conversion from timestamp -> time trivial (just a metadata change). I think this is the correct, and preferred, interpretation. However, it would require all implementations to interpret time in this way. With that in mind, if we think this is the correct approach, I'd like to clean up the docs.
(B) doesn't make sense at all to me. Really, (A) is the only reasonable interpretation.
We don't check data at IO boundaries by default, since that would be expensive (for example, we don't check for valid UTF8). However, see https://issues.apache.org/jira/browse/ARROW-10924 for explicit temporal data validation in C++.
Regards Antoine.