I agree with all problems that you mention about datetime tz-aware data. I lived for almost 30 years in a country which had in different periods of time up to 10 time zones, and on a regular basis changed it (merge/unmerge) , disable DST, temporarily enable DST. In addition I also worked in a different bank for about 10 years (legacy systems which don't update tzdata for ages) . I think I had most of the bad cases with time zones. And I think everyone somehow has a problem with different time zones: Calendars + events, flight booking systems which don't know about timezones and you might find that your connecting flight flew away an hour ago, etc.
In addition the error might happen in different places, databases (not updated tzdata, or db doesn't work correctly), client libraries, OS, etc. The person who finally solves tz-aware data should be granted all awards in the World. > For example, we got recently bitten by datetime.tzname() (which is supposed to 'time zone name') returning short-hand notation timezones (e.g. PST) > instead of full timezone names (e.g. "Europe/Amsterdam") which makes deserialization non deterministic. Yeah, and even ZoneInfo doesn't solve the problem with `datetime.tzname` because final implementation depends on different factors, tzinfo implementation and internals of datetime. > moving to zoneinfo seems to make sense though and will also be in Pendulum 3 I've have a look couple days ago about zoneinfo, it also have some "pitfalls", e.g. if timezone created from file it can't be easily serialized https://docs.python.org/3.9/library/zoneinfo.html#the-zoneinfo-class > Pendulum has proven us in the past, maybe we indeed should help the project if possible and if that isn't possible verify formal correctness of any other library I guess all other libraries might have a different kind of issue including compatibility with databases. More close replacement it is dateutil, but it also maintained by one person last release was 2 years ago and contains quite a few issues with timezones/DTS (no blame, that is just a fact) On Thu, 28 Sept 2023 at 15:39, Bolke de Bruin <[email protected]> wrote: > Thanks for starting the discussion Andrey. > > Some background on the choice for Pendulum at the time. In the early days > of Airflow it wasn't timezone aware. Originating from Airbnb which had a > reasonable mature data organization the view was everything needs to be in > UTC. According to Maxime the engineers would dream in UTC ;-). However, in > the real world which also needs to deal with legacy that didn't hold. Often > systems of record did not store timezone information but were localized > nevertheless. Cutoff times in banks happen in localized time and if you > want to meet those, Airflow needed to do better. > > Doing timezones and being timezone aware proved to be exceptionally hard. > Many libraries get it wrong [1] and fail silently (i.e. Arrow) or apply DST > transitions wrongly (pytz). When dealing with payments that stuff cannot > happen. To make things worse, in Python timezone support is pretty > convoluted, while some standardization happened in 3.9 by using IANA > provided timezone information from the local system, its API is messy. For > example, we got recently bitten by datetime.tzname() (which is > supposed to 'time > zone name') returning short-hand notation timezones (e.g. PST) instead of > full timezone names (e.g. "Europe/Amsterdam") which makes deserialization > non deterministic. > > So, what I am trying to say, is tread carefully when doing changes as > proposed in [2] (moving to zoneinfo seems to make sense though and will > also be in Pendulum 3). Make sure those changes are formally correct and > don't assume because they are now part of python itself (pytz was the > defacto standard for a long time). Pendulum has proven us in the past, > maybe we indeed should help the project if possible and if that isn't > possible verify formal correctness of any other library. > > Bolke > > [1] https://pendulum.eustace.io/faq/ > [2] https://github.com/apache/airflow/issues/19450 > > On Thu, 28 Sept 2023 at 11:03, Andrey Anshin <[email protected]> > wrote: > > > This discussion is more about the known problem of pendulum and how we > > could deal with it and maybe how we (as Community) might help autor. > > > > The library is mostly supported by a single author Sébastien Eustace ( > > https://github.com/sdispater) and it seems like we bump into the > situation > > which is described in xkcd #2347 ( > > https://imgs.xkcd.com/comics/dependency.png). To be honest it is not > > something new when library mainly supported by one author so there is > > always a risk that the library will no longer be supported / abandoned > > And if takes in account that pendulum provides core functionality in > > Airflow it could have dramatical impact in the future. > > > > Pendulum is a really nice library which helps a lot of developers to work > > with dates/datetimes. However there is one major problem, the last > release > > of this library happened more than 3 years ago ( > > https://pypi.org/project/pendulum/#history) in the time when Airflow > > 1.10.11 was released > > > > Fortunately, the project is not abandoned and on a regular basis commits > > add into the master branch. However these commits are not included into > any > > final release and that's why some things related to datetime don't work > as > > expected in Airflow. There are list of known (for me) issues which are > > affect Airflow > > > > *Memory Leak on parse*: > > - https://github.com/sdispater/pendulum/issues/720, this one fixed 2 > > years > > ago but not available yet ( > https://github.com/sdispater/pendulum/pull/563 > > ). > > Since we use parse dates in airflow codebase: datetime parameters and > > datetime in logs this one could be a reason for memory leakage in > Airflow: > > - https://github.com/apache/airflow/discussions/24694 > > - https://github.com/apache/airflow/discussions/28597 > > > > *Incorrect time zones*, known issues and should be already fixed in > master > > branch > > - https://github.com/sdispater/pendulum/issues/700, Mexico do not use > DST > > anymore > > - https://github.com/sdispater/pendulum/issues/706, Egypt reinstate DST > > > > We add clarification in https://github.com/apache/airflow/pull/30467, > > however it seems like there is no other way rather than patching Pendulum > > right now. > > > > All these issues should be solved as soon as pendulum 3 is released. The > > current announced estimation is end of september/ beginning of October: > > https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677 > > > > So in theory we would have a fixed version of pendulum soon, and it might > > break something in Airflow but from my point of view it is better than > > current status. > > > > However there might be a situation where the release of the pendulum > would > > be postponed, so maybe better to have a backup plan. What could we do in > > this case? > > > > Maybe we should start to use zoneinfo.ZoneInfo instead of pendulum > > datetime? https://github.com/apache/airflow/issues/19450 > > Pros: > > - stdlib (python 3.9+) > > - In pendulum 3.0 Timezone based on zoneinfo.Zoneinfo > > > > Cons: > > - Current serialization model can't deal with backport packages. E.g. > > timezone which are serialized in backport_zoneinfo can't be deserialized > in > > zoneinfo > > > > Maybe we should replace parse datetime with another solution. Does anyone > > know a good replacement? > > > > Maybe someone from Airflow Community could propose their help with > > maintenance of library: > > - https://github.com/sdispater/pendulum/issues/590 > > > > Maybe we should get rid of the pendulum at all, as a last resort > solution. > > I can't imagine how we could do that, because a lot of stuff depends on > the > > pendulum and removing it would be a breaking change. > > > > ---- > > Best Wishes > > *Andrey Anshin* > > > > > -- > > -- > Bolke de Bruin > [email protected] >
