pitrou commented on a change in pull request #7805:
URL: https://github.com/apache/arrow/pull/7805#discussion_r457436813
##########
File path: cpp/src/arrow/python/inference.cc
##########
@@ -332,6 +329,13 @@ class TypeInferrer {
++int_count_;
} else if (PyDateTime_Check(obj)) {
++timestamp_micro_count_;
+ OwnedRef tzinfo(PyObject_GetAttrString(obj, "tzinfo"));
+ if (tzinfo.obj() != nullptr && tzinfo.obj() != Py_None &&
timezone_.empty()) {
+ // From public docs on array construction
+ // "Localized timestamps will currently be returned as UTC "
+ // representation). "
Review comment:
If I reuse the nomenclature from the test, I get without this PR (it's
14h21 UTC currently):
```python
>>> now_utc
datetime.datetime(2020, 7, 20, 14, 21, 42, 96119, tzinfo=<UTC>)
>>> arr = pa.array([now_utc])
>>> arr
<pyarrow.lib.TimestampArray object at 0x7f44b0bfcc90>
[
2020-07-20 14:21:42.096119
]
>>> arr.type.tz
>>> arr.to_pylist()
[datetime.datetime(2020, 7, 20, 14, 21, 42, 96119)]
>>> arr.to_pandas()
0 2020-07-20 14:21:42.096119
dtype: datetime64[ns]
```
```python
>>> now_with_tz
datetime.datetime(2020, 7, 20, 10, 21, 42, 96119, tzinfo=<DstTzInfo
'US/Eastern' EDT-1 day, 20:00:00 DST>)
>>> arr = pa.array([now_with_tz])
>>> arr.type.tz
>>> arr.to_pylist()
[datetime.datetime(2020, 7, 20, 10, 21, 42, 96119)]
>>> arr.to_pylist()[0].tzinfo
>>> arr.to_pandas()
0 2020-07-20 10:21:42.096119
dtype: datetime64[ns]
```
So without the PR, there's a problem for non-UTC timestamps which lose the
timezone information. It seems to get worse with this PR because non-UTC
timestamps get tagged as UTC without being corrected for the timezone's offset,
which is misleading. At least originally you may be alerted by the absence of a
timezone on the type (in Python terms, it's a "naive" timestamp).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]