jorisvandenbossche commented on pull request #10647:
URL: https://github.com/apache/arrow/pull/10647#issuecomment-901783928


   Sorry for the slow response here, but I think there are still a few 
behavioural aspects to fix/clarify:
   
   * Related to @westonpace's comment above 
(https://github.com/apache/arrow/pull/10647#discussion_r668364905), you added a 
"Z" to the default format. However, this is only correct if you have a UTC 
timezone, and not for any other timezone. For example:
       ```python
       >>> ts = pd.to_datetime(["2018-03-10 09:00"]).tz_localize("US/Eastern")
       >>> ts
       DatetimeIndex(['2018-03-10 09:00:00-05:00'], dtype='datetime64[ns, 
US/Eastern]', freq=None)
       >>> tsa = pa.array(ts)
       >>> tsa
       <pyarrow.lib.TimestampArray object at 0x7f7350b087c0>
       [
         2018-03-10 14:00:00.000000000
       ]
   
       >>> pc.strftime(tsa)
       <pyarrow.lib.StringArray object at 0x7f7350b74a60>
       [
         "2018-03-10T09:00:00.000000000Z"
       ]
       ```
     So it's correctly showing the timestamp in the timezone's local time, but 
thus the "Z" indicator for UTC is wrong (the correct UTC time is 14:00, not 
09:00). 
     I think we should only add the "Z" indicator if the timezone is UTC. I am 
not fully sure what we should then use as default format for non-UTC timezones 
though: don't show any timezone information, include a numeric offset, or 
error. 
     That would also mean that the "default" format string would depend on the 
input type of the data, which might not be easy / desirable.
   
   - I commented about the timezone handling when the initial PR had a keyword 
for this, but I forgot to reply after you removed that keyword (and support for 
local timestamps) altogether. But, what's the reasoning for disallowing local 
timestamps without timezone? I don't think there is any ambiguity in how they 
would be formatted? (after all, they represent "clock" time, which in the end 
is kind of a formatted string)
   
   - There was some discussion above about the behaviour of `%S` 
(https://github.com/apache/arrow/pull/10647#discussion_r670410876), where 
`date.h` / C++ handles it differently as Python or R (i.e. we are including the 
fractional sub-second decimals, and there is no easy way to only show integer 
seconds apart from casting to `timestamp("s")` first AFAIK). 
     Since there are conflicting standards vs language implementations, there 
is no easy way to solve this. But I think it would be good to at least document 
this difference (it will be surprising for Python/R users) and how to 
work-around it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to