nealrichardson commented on a change in pull request #7604:
URL: https://github.com/apache/arrow/pull/7604#discussion_r449110010
##########
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##########
@@ -642,24 +641,27 @@ inline Status ConvertStruct(const PandasOptions& options,
const ChunkedArray& da
std::vector<OwnedRef> fields_data(num_fields);
OwnedRef dict_item;
- // XXX(wesm): In ARROW-7723, we found as a result of ARROW-3789 that second
+ // In ARROW-7723, we found as a result of ARROW-3789 that second
// through microsecond resolution tz-aware timestamps were being promoted to
// use the DATETIME_NANO_TZ conversion path, yielding a datetime64[ns] NumPy
// array in this function. PyArray_GETITEM returns datetime.datetime for
// units second through microsecond but PyLong for nanosecond (because
- // datetime.datetime does not support nanoseconds). We inserted this hack to
- // preserve the <= 0.15.1 behavior until a better solution can be devised
+ // datetime.datetime does not support nanoseconds).
+ // We force the object conversion to preserve the value of the timezone.
PandasOptions modified_options = options;
- modified_options.ignore_timezone = true;
modified_options.coerce_temporal_nanoseconds = false;
for (int c = 0; c < data.num_chunks(); c++) {
auto arr = checked_cast<const StructArray*>(data.chunk(c).get());
// Convert the struct arrays first
for (int32_t i = 0; i < num_fields; i++) {
+ // Se notes above about conversion.
+ std::shared_ptr<Array> field = arr->field(static_cast<int>(i));
+ modified_options.timestamp_as_object =
+ field->type()->id() == Type::TIMESTAMP &&
+ !checked_cast<const
TimestampType&>(*field->type()).timezone().empty();
Review comment:
Yeah, there's the same issue in R too. In R at least, tz-naive data is
localized when it is printed, so it might *look* inconsistent, but the data
itself is not modified. I think that part is important--we shouldn't change
whether the data is timezone naive when we convert it to/from Arrow. When I
fixed that in the R bindings, rather than assuming UTC, all of these things
that I thought were problems went away.
(Full disclosure: I have not looked at what this PR is doing, just chiming
in on the discussion based on having faced what sounds like similar issues in
the R bindings.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]