emkornfield commented on a change in pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#discussion_r449130523



##########
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##########
@@ -642,24 +641,27 @@ inline Status ConvertStruct(const PandasOptions& options, 
const ChunkedArray& da
   std::vector<OwnedRef> fields_data(num_fields);
   OwnedRef dict_item;
 
-  // XXX(wesm): In ARROW-7723, we found as a result of ARROW-3789 that second
+  // In ARROW-7723, we found as a result of ARROW-3789 that second
   // through microsecond resolution tz-aware timestamps were being promoted to
   // use the DATETIME_NANO_TZ conversion path, yielding a datetime64[ns] NumPy
   // array in this function. PyArray_GETITEM returns datetime.datetime for
   // units second through microsecond but PyLong for nanosecond (because
-  // datetime.datetime does not support nanoseconds). We inserted this hack to
-  // preserve the <= 0.15.1 behavior until a better solution can be devised
+  // datetime.datetime does not support nanoseconds).
+  // We force the object conversion to preserve the value of the timezone.
   PandasOptions modified_options = options;
-  modified_options.ignore_timezone = true;
   modified_options.coerce_temporal_nanoseconds = false;
 
   for (int c = 0; c < data.num_chunks(); c++) {
     auto arr = checked_cast<const StructArray*>(data.chunk(c).get());
     // Convert the struct arrays first
     for (int32_t i = 0; i < num_fields; i++) {
+      // Se notes above about conversion.
+      std::shared_ptr<Array> field = arr->field(static_cast<int>(i));
+      modified_options.timestamp_as_object =
+          field->type()->id() == Type::TIMESTAMP &&
+          !checked_cast<const 
TimestampType&>(*field->type()).timezone().empty();

Review comment:
       > Who does this? Does arrow do this in conversion? Or do you mean that 
PyArray_GETITEM does this? (AFAIK numpy used to do things with system time 
zone, but that should be solved nowadays)
   
   Python's [datetime.astimezone assumes 
this](https://docs.python.org/3/library/datetime.html#datetime.datetime.astimezone)
 as of 3.6.  Since datetime essentially becomes the "display" I would think 
most people would assume system timezone values.
   
   > > Our timestamp -> datetime conversion assumes UTC as far as I can tell 
(like I said not sure what pandas does here).
   
   > Where do we assume this? (and this is about naive or aware timestamps?)
   
   Core [timestamp conversion doesn't handle 
timezones](https://github.com/apache/arrow/blob/edd88d7d222598550e4812c94194cbf973b20456/cpp/src/arrow/python/datetime.cc#L246).
  for non-naive we use 
[fromutc](https://github.com/apache/arrow/blob/89cf7bdd6fc0d34c96f1a0e1df0a3505b308188d/python/pyarrow/scalar.pxi#L310)
 after the fact.
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to