jorisvandenbossche commented on issue #36110:
URL: https://github.com/apache/arrow/issues/36110#issuecomment-1593720795
So essentially there is a difference between "localizing" it on the python
side (using the python object's tzinfo to get the utc value), or doing it with
arrow's kernel:
```
ts1 = pa.array([datetime(2038, 4, 1,
3).replace(tzinfo=ZoneInfo('America/Boise'))])
ts2 = pc.assume_timezone(pa.array([datetime(2038, 4, 1, 3)]),
timezone='America/Boise')
```
In principle we expect that those two give the same result, but they don't:
```
>>> ts1
<pyarrow.lib.TimestampArray object at 0x7fd96d5b90c0>
[
2038-04-01 09:00:00.000000
]
>>> ts2
<pyarrow.lib.TimestampArray object at 0x7fd96caeb520>
[
2038-04-01 10:00:00.000000
]
```
While one possible explanation is that there could be a bug in one of the
two code paths (although `ts1` based on the python object seems correct, as
asking the UTC value using python APIs (`datetime(2038, 4, 1,
3).replace(tzinfo=ZoneInfo('America/Boise')).astimezone(timezone.utc)`)
indicates this is indeed 09:00, so it is expected that the pyarrow conversion
gives the same result, as we ask the python object to convert itself to UTC,
and don't use our own `assume_timezone` kernel for this operation).
The other possible explanation is that there is a discrepancy in the
timezone databases that are being used by both (python vs pyarrow), since this
is not necessarily exactly the same version.
In my case, python's zoneinfo is using the data included in my conda env:
```
In [35]: zoneinfo.TZPATH
Out[35]:
('/home/joris/miniconda3/envs/arrow-dev/share/zoneinfo',
'/home/joris/miniconda3/envs/arrow-dev/share/tzinfo')
```
While I think that the vendored tz.cpp in arrow uses a system tzdata
(`/usr/share/zoneinfo`, I think?). Those are binary files, so not directly easy
to check if the info on America/Boise differs between both.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]