jorisvandenbossche commented on issue #41268:
URL: https://github.com/apache/arrow/issues/41268#issuecomment-2063190391
Casting string to timestamp is essentially parsing of the string
(`strptime`), and for that we currently don't allow to parse to a non-tz-aware
string to a tz-aware timestamp (for that you would need to guess if the string
is in local wall time or in UTC, i.e. is it a tz localize or a tz convert
operation, in pandas terms).
The other examples you give are parsing a non-tz-aware string to a
non-tz-aware timestamp (no ambiguity, this works fine) and casting non-tz-aware
timestamp to tz-aware timestamp. This last case is also potentially ambiguous,
but the casting here is a very simple zero-copy cast that essentially just
changes the metadata of the timestamp type (to add a timezone), and thus
essentially treats the input as UTC (and not local wall time, for which there
is a specific kernel `pc.assume_timezone`).
And so parsing a non-tz-aware string to a tz-aware timestamp can always be
done in two steps, first parsing / casting to timestamp, and then converting to
tz-aware timestamp:
```
>>> pa.array(["2024-01-01
05:00:00"]).cast(pa.timestamp("s")).cast(pa.timestamp("s", "Europe/Brussels"))
<pyarrow.lib.TimestampArray object at 0x7f065c331960>
[
2024-01-01 05:00:00Z
]
>>> pc.assume_timezone(pa.array(["2024-01-01
05:00:00"]).cast(pa.timestamp("s")), "Europe/Brussels")
<pyarrow.lib.TimestampArray object at 0x7f065c2d26e0>
[
2024-01-01 04:00:00Z
]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]