tustvold opened a new issue, #3794:
URL: https://github.com/apache/arrow-rs/issues/3794

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   `string_to_timestamp_nanos` contains logic to parse timestamp-like string to 
nanoseconds since UTC epoch.
   
   The semantics for this are well defined for timestamps including a timezone, 
e.g. `1997-01-31 09:26:56.123Z` or `1997-01-31T09:26:56.123-05:00`. However, 
the semantics get confused for timestamps of the form `1997-01-31T09:26:56.123`
   
   As pointed out by @MachaelLee on 
https://github.com/apache/arrow-rs/pull/3787 prior to 
https://github.com/apache/arrow-rs/pull/2814 timestamp string without a 
timezone would be interpreted as being in the system's local timezone, and this 
continues to be what the function docs state happens. This was changed in 
https://github.com/apache/arrow-rs/pull/2814 by @waitingkuo to instead be 
parsed in the UTC timezone.
   
   There are at least three "correct" behaviours when parsing strings without 
an embedded timezone `1997-01-31T09:26:56.123` depending on the context
   
   * When parsing a user-provided timestamp, use the system 
[`Local`](https://docs.rs/chrono/latest/chrono/offset/struct.Local.html) 
timezone and convert back to UTC **what it used to do**
   * When parsing a string to a datatype without a timezone assume UTC **what 
it currently does**
   * When parsing a string to a Timestamp column with a timezone, should assume 
the timestamp is in the given 
[Tz](https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html) 
and convert back to UTC
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   Provide a function with the signature
   
   ```
   pub fn string_to_datetime<T: Timezone>(t: &T, s: &str) -> Result<DateTime<T>>
   ```
   
   This could then be used with 
[Local](https://docs.rs/chrono/latest/chrono/offset/struct.Local.html), 
[Utc](https://docs.rs/chrono/latest/chrono/offset/struct.Utc.html), or 
[Tz](https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html) as 
appropriate.
   
   We could then update `string_to_timestamp_nanos` to be something like
   
   ```
   pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
       to_timestamp_nanos(string_to_datetime(Utc, s)?.naive_utc())
   }
   ```
   
   And possibly deprecate it as it has rather confusing semantics
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to