[
https://issues.apache.org/jira/browse/ARROW-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437877#comment-17437877
]
Joris Van den Bossche edited comment on ARROW-14567 at 11/3/21, 10:01 AM:
--------------------------------------------------------------------------
The PrettyPrint for Timestamp type is implemented in
{{StringFormatter<TimestampType>}} at {{formatting.h}}
(https://github.com/apache/arrow/blob/bf67ec74635db2183619601f025e4724bd5a6b75/cpp/src/arrow/util/formatting.h#L431-L473).
Currently this simply does not take into account any timezone information of
the type, and formats the stored UTC epoch. Personally, I find this confusing
as the current repr for the array has no indication whatsoever how to interpret
the printed time values (and without any indication, my first expectation would
be local time in the timezone of the type, which isn't the case).
The formatting code already uses the date.h utilities (and which we have
vendored anyway), so in principle we could use date.h to first localize the
epoch value. However, that makes printing dependent on finding a timezone
database (eg not yet supported on Windows at the moment).
An alternative could be to keep the printed value in UTC but add a {{+00:00}}
indication to make it it clear this are the UTC times that are printed (and not
the wall clock time in the timezone of the type).
cc [~rokm] [~apitrou] [~lidavidm]
was (Author: jorisvandenbossche):
The PrettyPrint for Timestamp type is implemented in
{{StringFormatter<TimestampType>}} at {{formatting.h}}
(https://github.com/apache/arrow/blob/bf67ec74635db2183619601f025e4724bd5a6b75/cpp/src/arrow/util/formatting.h#L431-L473).
Currently this simply does not take into account any timezone information of
the type, and formats the stored UTC epoch.
The formatting code already uses the date.h utilities (and which we have
vendored anyway), so in principle we could use date.h to first localize the
epoch value. However, that makes printing dependent on finding a timezone
database (eg not yet supported on Windows at the moment).
An alternative could be to keep the printed value in UTC but add a {{+00:00}}
indication to make it it clear this are the UTC times that are printed (and not
the wall clock time in the timezone of the type).
cc [~rokm] [~apitrou] [~lidavidm]
> [C++][Python] PrettyPrint ignores timezone
> ------------------------------------------
>
> Key: ARROW-14567
> URL: https://issues.apache.org/jira/browse/ARROW-14567
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Alenka Frim
> Priority: Major
>
> When printing TimestampArray in pyarrow the timezone information is ignored
> by PrettyPrint (__str__ calls to_string() in array.pxi).
> {code:python}
> import pyarrow as pa
> a = pa.array([0], pa.timestamp('s', tz='+02:00'))
> print(a) # representation not correct?
> # <pyarrow.lib.TimestampArray object at 0x7f834c7cb9a8>
> # [
> # 1970-01-01 00:00:00
> # ]
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)