tustvold opened a new issue, #3794: URL: https://github.com/apache/arrow-rs/issues/3794
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** `string_to_timestamp_nanos` contains logic to parse timestamp-like string to nanoseconds since UTC epoch. The semantics for this are well defined for timestamps including a timezone, e.g. `1997-01-31 09:26:56.123Z` or `1997-01-31T09:26:56.123-05:00`. However, the semantics get confused for timestamps of the form `1997-01-31T09:26:56.123` As pointed out by @MachaelLee on https://github.com/apache/arrow-rs/pull/3787 prior to https://github.com/apache/arrow-rs/pull/2814 timestamp string without a timezone would be interpreted as being in the system's local timezone, and this continues to be what the function docs state happens. This was changed in https://github.com/apache/arrow-rs/pull/2814 by @waitingkuo to instead be parsed in the UTC timezone. There are at least three "correct" behaviours when parsing strings without an embedded timezone `1997-01-31T09:26:56.123` depending on the context * When parsing a user-provided timestamp, use the system [`Local`](https://docs.rs/chrono/latest/chrono/offset/struct.Local.html) timezone and convert back to UTC **what it used to do** * When parsing a string to a datatype without a timezone assume UTC **what it currently does** * When parsing a string to a Timestamp column with a timezone, should assume the timestamp is in the given [Tz](https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html) and convert back to UTC **Describe the solution you'd like** <!-- A clear and concise description of what you want to happen. --> Provide a function with the signature ``` pub fn string_to_datetime<T: Timezone>(t: &T, s: &str) -> Result<DateTime<T>> ``` This could then be used with [Local](https://docs.rs/chrono/latest/chrono/offset/struct.Local.html), [Utc](https://docs.rs/chrono/latest/chrono/offset/struct.Utc.html), or [Tz](https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html) as appropriate. We could then update `string_to_timestamp_nanos` to be something like ``` pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> { to_timestamp_nanos(string_to_datetime(Utc, s)?.naive_utc()) } ``` And possibly deprecate it as it has rather confusing semantics **Describe alternatives you've considered** <!-- A clear and concise description of any alternative solutions or features you've considered. --> **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
