emkornfield commented on a change in pull request #7805:
URL: https://github.com/apache/arrow/pull/7805#discussion_r457510267
##########
File path: cpp/src/arrow/python/inference.cc
##########
@@ -332,6 +329,13 @@ class TypeInferrer {
++int_count_;
} else if (PyDateTime_Check(obj)) {
++timestamp_micro_count_;
+ OwnedRef tzinfo(PyObject_GetAttrString(obj, "tzinfo"));
+ if (tzinfo.obj() != nullptr && tzinfo.obj() != Py_None &&
timezone_.empty()) {
+ // From public docs on array construction
+ // "Localized timestamps will currently be returned as UTC "
+ // representation). "
Review comment:
> It seems to get worse with this PR because non-UTC timestamps get
tagged as UTC without being corrected for the timezone's offset, which is
misleading.
That is not the intent of the PR, right now everything gets converted to UTC
but in my environment UTC tzinfo is maintained.. I can't replicate your
results above (timezone information for me is propagaged as an example with
now_with_tz:
```
>>> now_with_tz = datetime.datetime(2020, 7, 20, 10, 21, 42, 96119,
tzinfo=pytz.timezone('US/Eastern'))
>>> arr = pa.array([now_with_tz])
>>> arr.type.tz
'UTC'
>>> arr.to_pylist()
[datetime.datetime(2020, 7, 20, 15, 17, 42, 96119, tzinfo=<UTC>)]
>>> arr.to_pylist()[0].tzinfo
<UTC>
>>> arr.to_pandas()
0 2020-07-20 15:17:42.096119+00:00
dtype: datetime64[ns, UTC]
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]