emkornfield commented on a change in pull request #7805:
URL: https://github.com/apache/arrow/pull/7805#discussion_r457510267



##########
File path: cpp/src/arrow/python/inference.cc
##########
@@ -332,6 +329,13 @@ class TypeInferrer {
       ++int_count_;
     } else if (PyDateTime_Check(obj)) {
       ++timestamp_micro_count_;
+      OwnedRef tzinfo(PyObject_GetAttrString(obj, "tzinfo"));
+      if (tzinfo.obj() != nullptr && tzinfo.obj() != Py_None && 
timezone_.empty()) {
+        // From public docs on array construction
+        // "Localized timestamps will currently be returned as UTC "
+        //     representation). "

Review comment:
       > It seems to get worse with this PR because non-UTC timestamps get 
tagged as UTC without being corrected for the timezone's offset, which is 
misleading.
   
   That is not the intent of the PR, right now everything gets converted to UTC 
but in my environment UTC tzinfo is maintained..  I can't replicate your 
results above (timezone information for me is propagaged as an example with 
now_with_tz:
   
   ```
   >>> now_with_tz = datetime.datetime(2020, 7, 20, 10, 21, 42, 96119, 
tzinfo=pytz.timezone('US/Eastern'))
   >>> arr = pa.array([now_with_tz]) 
   >>> arr.type.tz  
   'UTC'
   >>> arr.to_pylist() 
   [datetime.datetime(2020, 7, 20, 15, 17, 42, 96119, tzinfo=<UTC>)]
   >>> arr.to_pylist()[0].tzinfo
   <UTC>
   >>> arr.to_pandas()
   0   2020-07-20 15:17:42.096119+00:00
   dtype: datetime64[ns, UTC]
   ```
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to