emkornfield commented on a change in pull request #7805:
URL: https://github.com/apache/arrow/pull/7805#discussion_r457510267



##########
File path: cpp/src/arrow/python/inference.cc
##########
@@ -332,6 +329,13 @@ class TypeInferrer {
       ++int_count_;
     } else if (PyDateTime_Check(obj)) {
       ++timestamp_micro_count_;
+      OwnedRef tzinfo(PyObject_GetAttrString(obj, "tzinfo"));
+      if (tzinfo.obj() != nullptr && tzinfo.obj() != Py_None && 
timezone_.empty()) {
+        // From public docs on array construction
+        // "Localized timestamps will currently be returned as UTC "
+        //     representation). "

Review comment:
       > It seems to get worse with this PR because non-UTC timestamps get 
tagged as UTC without being corrected for the timezone's offset, which is 
misleading.
   
   That is not the intent of the PR, right now everything gets corrected to 
UTC.  As an example:
   This correctly keeps the times logically the same.  I can make the change to 
try to keep the original timezones in place and changes US/Eastern to the 
correct time in UTC>
   
   
   ```
   >>> now_with_tz = datetime.datetime(2020, 7, 20, 10, 21, 42, 96119, 
tzinfo=pytz.timezone('US/Eastern'))
   >>> arr = pa.array([now_with_tz]) 
   >>> arr.type.tz  
   'UTC'
   >>> arr.to_pylist() 
   [datetime.datetime(2020, 7, 20, 15, 17, 42, 96119, tzinfo=<UTC>)]
   >>> arr.to_pylist()[0].tzinfo
   <UTC>
   >>> arr.to_pandas()
   0   2020-07-20 15:17:42.096119+00:00
   dtype: datetime64[ns, UTC]
   ```
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to