jorisvandenbossche commented on issue #36110:
URL: https://github.com/apache/arrow/issues/36110#issuecomment-1594279063

   Digging a little bit further: it's partly due to a different timezone 
database, but mostly due to a different way of "interpolating" into the future 
(i.e. the offset data in the database only go until a certain time, so the 
question is which UTC offset to use for a datetime after that?). And pytz and 
zoneinfo make different choices for this: pytz will just propagate the last 
offset data point (and so not take into account any DST or not in the future), 
while zoneinfo uses additional information in the database to predict for 
future datetimes if it would be DST or not. 
   See the answer in 
https://stackoverflow.com/questions/74520944/how-does-zoneinfo-handle-dst-in-the-distant-future
   
   As a small illustration (using some internals of pytz and zoneinfo):
   
   <details>
   
   ```python
   >>> from datetime import datetime
   >>> dt = datetime(2038, 4, 1, 3)
   
   # pytz last offset data for this timezone is in 2037 and is one without DST
   >>> import pytz
   >>> tz_pytz = pytz.timezone("America/Boise")
   >>> print(tz_pytz._utc_transition_times[-1])
   2037-11-01 08:00:00
   >>> tz_pytz._transition_info[-1]
   (datetime.timedelta(days=-1, seconds=61200), datetime.timedelta(0), 'MST')
   # so we get no DST offset for any datetime in 2038 or later
   >>> tz_pytz.dst(dt)
   datetime.timedelta(0)
   >>> tz_pytz.utcoffset(dt)
   datetime.timedelta(days=-1, seconds=61200)  # UTC offset of -07:00
   >>> print(pytz.timezone("America/Boise").localize(dt))
   2038-04-01 03:00:00-07:00
   >>> 
print(pytz.timezone("America/Boise").localize(dt).astimezone(timezone.utc))
   2038-04-01 10:00:00+00:00
   
   # using zoneinfo in my conda env, it only has data up to 2007 and the last 
offset if a DST one
   >>> from zoneinfo._zoneinfo import ZoneInfo  # importing the python 
implemention (not the C one), so I can hack around
   >>> tz = ZoneInfo.no_cache("America/Boise")
   >>> print(datetime.fromtimestamp(tz._trans_utc[-1]))
   2007-03-11 10:00:00
   >>> tz._ttinfos[-1]
   _ttinfo(-1 day, 18:00:00, 1:00:00, MDT)
   
   # we can let zoneinfo use the data from pytz
   >>> from zoneinfo._tzpath import reset_tzpath
   >>> 
reset_tzpath(("/home/joris/miniconda3/envs/arrow-dev/lib/python3.10/site-packages/pytz/zoneinfo",
 ))
   >>> tz = ZoneInfo.no_cache("America/Boise")
   >>> print(datetime.fromtimestamp(tz._trans_utc[-1]))
   2037-11-01 09:00:00
   >>> tz._ttinfos[-1]
   _ttinfo(-1 day, 17:00:00, 0:00:00, MST)
   
   # the last offset is now a non-DST one, but because zoneinfo uses a rule to 
determine DST for
   # future datetimes the date of 2038-04-01 still uses a DST offset
   >>> tz.dst(dt)
   datetime.timedelta(seconds=3600)
   >>> tz.utcoffset(dt)
   datetime.timedelta(days=-1, seconds=64800)  # UTC offset of -06:00
   >>> print(dt.replace(tzinfo=tz))
   2038-04-01 03:00:00-06:00
   >>> print(dt.replace(tzinfo=tz).astimezone(timezone.utc))
   2038-04-01 09:00:00+00:00
   
   # but with a small hack we can disable this "rule-based" determination of 
future datetimes, and to let it
   # use the last offset data point, similar to the logic in pytz. And now we 
get similar result as pytz:
   >>> tz._tz_after = tz._ttinfos[-1]
   >>> tz.dst(dt)
   datetime.timedelta(0)   # no DST offset
   >>> tz.utcoffset(dt)
   datetime.timedelta(days=-1, seconds=61200)  # UTC offset of -07:00
   >>> print(dt.replace(tzinfo=tz))
   2038-04-01 03:00:00-07:00
   >>> print(dt.replace(tzinfo=tz).astimezone(timezone.utc))
   2038-04-01 10:00:00+00:00
   ```
   
   </details>
   
   So that explains the different UTC value we get depending on whether the 
python datetime object was using a `pytz` or `zoneinfo` timestamp (so that also 
explains https://github.com/apache/arrow/issues/15047#issuecomment-1593598589). 
   Of course, we also saw a different behaviour for our own `assume_timezone` 
kernel. But based on the result, I assume that this follows the logic of pytz 
(extending the last offset into the future) and doesn't support this rule-based 
DST determination for future dates. This seems to be confirmed by the comment 
at https://github.com/HowardHinnant/date/issues/563#issuecomment-607439821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to