emkornfield commented on a change in pull request #7805:
URL: https://github.com/apache/arrow/pull/7805#discussion_r457510267
##########
File path: cpp/src/arrow/python/inference.cc
##########
@@ -332,6 +329,13 @@ class TypeInferrer {
++int_count_;
} else if (PyDateTime_Check(obj)) {
++timestamp_micro_count_;
+ OwnedRef tzinfo(PyObject_GetAttrString(obj, "tzinfo"));
+ if (tzinfo.obj() != nullptr && tzinfo.obj() != Py_None &&
timezone_.empty()) {
+ // From public docs on array construction
+ // "Localized timestamps will currently be returned as UTC "
+ // representation). "
Review comment:
> It seems to get worse with this PR because non-UTC timestamps get
tagged as UTC without being corrected for the timezone's offset, which is
misleading.
That is not the intent of the PR, right now everything gets corrected to
UTC. As an example:
This correctly keeps the times logically the same. I can make the change to
try to keep the original timezones in place and changes US/Eastern to the
correct time in UTC>
```
>>> now_with_tz = datetime.datetime(2020, 7, 20, 10, 21, 42, 96119,
tzinfo=pytz.timezone('US/Eastern'))
>>> arr = pa.array([now_with_tz])
>>> arr.type.tz
'UTC'
>>> arr.to_pylist()
[datetime.datetime(2020, 7, 20, 15, 17, 42, 96119, tzinfo=<UTC>)]
>>> arr.to_pylist()[0].tzinfo
<UTC>
>>> arr.to_pandas()
0 2020-07-20 15:17:42.096119+00:00
dtype: datetime64[ns, UTC]
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]