jorisvandenbossche commented on issue #36110:
URL: https://github.com/apache/arrow/issues/36110#issuecomment-1593720795

   So essentially there is a difference between "localizing" it on the python 
side (using the python object's tzinfo to get the utc value), or doing it with 
arrow's kernel:
   
   ```
   ts1 = pa.array([datetime(2038, 4, 1, 
3).replace(tzinfo=ZoneInfo('America/Boise'))])
   ts2 = pc.assume_timezone(pa.array([datetime(2038, 4, 1, 3)]), 
timezone='America/Boise')
   ```
   
   In principle we expect that those two give the same result, but they don't:
   
   ```
   >>> ts1
   <pyarrow.lib.TimestampArray object at 0x7fd96d5b90c0>
   [
     2038-04-01 09:00:00.000000
   ]
   >>> ts2
   <pyarrow.lib.TimestampArray object at 0x7fd96caeb520>
   [
     2038-04-01 10:00:00.000000
   ]
   ```
   
   While one possible explanation is that there could be a bug in one of the 
two code paths (although `ts1` based on the python object seems correct, as 
asking the UTC value using python APIs (`datetime(2038, 4, 1, 
3).replace(tzinfo=ZoneInfo('America/Boise')).astimezone(timezone.utc)`) 
indicates this is indeed 09:00, so it is expected that the pyarrow conversion 
gives the same result, as we ask the python object to convert itself to UTC, 
and don't use our own `assume_timezone` kernel for this operation).  
   
   The other possible explanation is that there is a discrepancy in the 
timezone databases that are being used by both (python vs pyarrow), since this 
is not necessarily exactly the same version.
   In my case, python's zoneinfo is using the data included in my conda env:
   
   ```
   In [35]: zoneinfo.TZPATH
   Out[35]: 
   ('/home/joris/miniconda3/envs/arrow-dev/share/zoneinfo',
    '/home/joris/miniconda3/envs/arrow-dev/share/tzinfo')
   ```
   
   While I think that the vendored tz.cpp in arrow uses a system tzdata 
(`/usr/share/zoneinfo`, I think?). Those are binary files, so not directly easy 
to check if the info on America/Boise differs between both.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to