I believe some discussion on this has already occurred: https://github.com/apache/airflow/issues/19450 (in fact I commented some concerns on it https://github.com/apache/airflow/issues/19450#issuecomment-1536494966)
I guess I remain concerned the zoneinfo API isn't sufficient for many places where the pendulum is, and it isn't sufficiently mature (in the sense that not enough people are relying on it) and that could be doubly bad in the standard library because bug will remain permanently in specific versions of Python. I appreciate this is a balancing issue, if pendulum doesn't make new releases then it can't be relied on forever. Damian -----Original Message----- From: Bolke de Bruin <[email protected]> Sent: Thursday, September 28, 2023 10:03 AM To: [email protected] Subject: Re: [DISCUSS] Future of Pendulum in Airflow FYI: I've just added: https://github.com/apache/airflow/pull/34667 which documents how to use newer timezone information with Pendulum. Also work seems to be progressing (albeit slowly) on Pendulum 3: https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299677 Bolke On Thu, 28 Sept 2023 at 15:12, Bolke de Bruin <[email protected]> wrote: > for serialization I am not too worried about ZoneInfo. We do not use > pickling by default as we roll our own serialization format. We > probably just need the key (zoneinfo.key). > > I'm not sure what happened about this: > > https://github.com/sdispater/pendulum/issues/590 > > Bolke > > On Thu, 28 Sept 2023 at 14:59, Andrey Anshin > <[email protected]> > wrote: > >> I agree with all problems that you mention about datetime tz-aware data. >> I lived for almost 30 years in a country which had in different >> periods of time up to 10 time zones, and on a regular basis changed >> it >> (merge/unmerge) >> , disable DST, temporarily enable DST. In addition I also worked in a >> different bank for about 10 years (legacy systems which don't update >> tzdata for ages) . I think I had most of the bad cases with time >> zones. And I think everyone somehow has a problem with different time >> zones: Calendars >> + >> events, flight booking systems which don't know about timezones and >> you might find that your connecting flight flew away an hour ago, etc. >> >> In addition the error might happen in different places, databases >> (not updated tzdata, or db doesn't work correctly), client libraries, OS, >> etc. >> The person who finally solves tz-aware data should be granted all >> awards in the World. >> >> > For example, we got recently bitten by datetime.tzname() (which is >> supposed >> to 'time zone name') returning short-hand notation timezones (e.g. >> PST) >> > instead of full timezone names (e.g. "Europe/Amsterdam") which >> > makes >> deserialization non deterministic. >> >> Yeah, and even ZoneInfo doesn't solve the problem with >> `datetime.tzname` because final implementation depends on different >> factors, tzinfo implementation and internals of datetime. >> >> > moving to zoneinfo seems to make sense though and will also be in >> Pendulum 3 >> >> I've have a look couple days ago about zoneinfo, it also have some >> "pitfalls", e.g. if timezone created from file it can't be easily >> serialized >> https://docs.python.org/3.9/library/zoneinfo.html#the-zoneinfo-class >> >> > Pendulum has proven us in the past, maybe we indeed should help the >> project if possible and if that isn't possible verify formal >> correctness of any other library >> >> I guess all other libraries might have a different kind of issue >> including compatibility with databases. >> More close replacement it is dateutil, but it also maintained by one >> person last release was 2 years ago and contains quite a few issues >> with timezones/DTS (no blame, that is just a fact) >> >> >> On Thu, 28 Sept 2023 at 15:39, Bolke de Bruin <[email protected]> wrote: >> >> > Thanks for starting the discussion Andrey. >> > >> > Some background on the choice for Pendulum at the time. In the >> > early >> days >> > of Airflow it wasn't timezone aware. Originating from Airbnb which >> > had a reasonable mature data organization the view was everything >> > needs to be >> in >> > UTC. According to Maxime the engineers would dream in UTC ;-). >> > However, >> in >> > the real world which also needs to deal with legacy that didn't hold. >> Often >> > systems of record did not store timezone information but were >> > localized nevertheless. Cutoff times in banks happen in localized >> > time and if you want to meet those, Airflow needed to do better. >> > >> > Doing timezones and being timezone aware proved to be exceptionally >> hard. >> > Many libraries get it wrong [1] and fail silently (i.e. Arrow) or >> > apply >> DST >> > transitions wrongly (pytz). When dealing with payments that stuff >> > cannot happen. To make things worse, in Python timezone support is >> > pretty convoluted, while some standardization happened in 3.9 by >> > using IANA provided timezone information from the local system, its API is >> > messy. >> For >> > example, we got recently bitten by datetime.tzname() (which is >> > supposed to 'time zone name') returning short-hand notation >> > timezones (e.g. PST) instead >> of >> > full timezone names (e.g. "Europe/Amsterdam") which makes >> deserialization >> > non deterministic. >> > >> > So, what I am trying to say, is tread carefully when doing changes >> > as proposed in [2] (moving to zoneinfo seems to make sense though >> > and will also be in Pendulum 3). Make sure those changes are >> > formally correct and don't assume because they are now part of >> > python itself (pytz was the defacto standard for a long time). >> > Pendulum has proven us in the past, maybe we indeed should help the >> > project if possible and if that isn't possible verify formal correctness >> > of any other library. >> > >> > Bolke >> > >> > [1] https://pendulum.eustace.io/faq/ [2] >> > https://github.com/apache/airflow/issues/19450 >> > >> > On Thu, 28 Sept 2023 at 11:03, Andrey Anshin >> > <[email protected]> >> > wrote: >> > >> > > This discussion is more about the known problem of pendulum and >> > > how we could deal with it and maybe how we (as Community) might help >> > > autor. >> > > >> > > The library is mostly supported by a single author Sébastien >> > > Eustace ( >> > > https://github.com/sdispater) and it seems like we bump into the >> > situation >> > > which is described in xkcd #2347 ( >> > > https://imgs.xkcd.com/comics/dependency.png). To be honest it is >> > > not something new when library mainly supported by one author so >> > > there is always a risk that the library will no longer be >> > > supported / abandoned And if takes in account that pendulum >> > > provides core functionality in Airflow it could have dramatical impact >> > > in the future. >> > > >> > > Pendulum is a really nice library which helps a lot of developers >> > > to >> work >> > > with dates/datetimes. However there is one major problem, the >> > > last >> > release >> > > of this library happened more than 3 years ago ( >> > > https://pypi.org/project/pendulum/#history) in the time when >> > > Airflow >> > > 1.10.11 was released >> > > >> > > Fortunately, the project is not abandoned and on a regular basis >> commits >> > > add into the master branch. However these commits are not >> > > included >> into >> > any >> > > final release and that's why some things related to datetime >> > > don't >> work >> > as >> > > expected in Airflow. There are list of known (for me) issues >> > > which are affect Airflow >> > > >> > > *Memory Leak on parse*: >> > > - https://github.com/sdispater/pendulum/issues/720, this one >> > > fixed 2 years ago but not available yet ( >> > https://github.com/sdispater/pendulum/pull/563 >> > > ). >> > > Since we use parse dates in airflow codebase: datetime parameters >> > > and datetime in logs this one could be a reason for memory >> > > leakage in >> > Airflow: >> > > - https://github.com/apache/airflow/discussions/24694 >> > > - https://github.com/apache/airflow/discussions/28597 >> > > >> > > *Incorrect time zones*, known issues and should be already fixed >> > > in >> > master >> > > branch >> > > - https://github.com/sdispater/pendulum/issues/700, Mexico do not >> > > use >> > DST >> > > anymore >> > > - https://github.com/sdispater/pendulum/issues/706, Egypt >> > > reinstate >> DST >> > > >> > > We add clarification in >> > > https://github.com/apache/airflow/pull/30467, >> > > however it seems like there is no other way rather than patching >> Pendulum >> > > right now. >> > > >> > > All these issues should be solved as soon as pendulum 3 is released. >> The >> > > current announced estimation is end of september/ beginning of >> October: >> > > >> https://github.com/sdispater/pendulum/issues/600#issuecomment-1711299 >> 677 >> > > >> > > So in theory we would have a fixed version of pendulum soon, and >> > > it >> might >> > > break something in Airflow but from my point of view it is better >> > > than current status. >> > > >> > > However there might be a situation where the release of the >> > > pendulum >> > would >> > > be postponed, so maybe better to have a backup plan. What could >> > > we do >> in >> > > this case? >> > > >> > > Maybe we should start to use zoneinfo.ZoneInfo instead of >> > > pendulum datetime? https://github.com/apache/airflow/issues/19450 >> > > Pros: >> > > - stdlib (python 3.9+) >> > > - In pendulum 3.0 Timezone based on zoneinfo.Zoneinfo >> > > >> > > Cons: >> > > - Current serialization model can't deal with backport packages. E.g. >> > > timezone which are serialized in backport_zoneinfo can't be >> deserialized >> > in >> > > zoneinfo >> > > >> > > Maybe we should replace parse datetime with another solution. >> > > Does >> anyone >> > > know a good replacement? >> > > >> > > Maybe someone from Airflow Community could propose their help >> > > with maintenance of library: >> > > - https://github.com/sdispater/pendulum/issues/590 >> > > >> > > Maybe we should get rid of the pendulum at all, as a last resort >> > solution. >> > > I can't imagine how we could do that, because a lot of stuff >> > > depends >> on >> > the >> > > pendulum and removing it would be a breaking change. >> > > >> > > ---- >> > > Best Wishes >> > > *Andrey Anshin* >> > > >> > >> > >> > -- >> > >> > -- >> > Bolke de Bruin >> > [email protected] >> > >> > > > -- > > -- > Bolke de Bruin > [email protected] > -- -- Bolke de Bruin [email protected] ________________________________ Strike Technologies, LLC (“Strike”) is part of the GTS family of companies. Strike is a technology solutions provider, and is not a broker or dealer and does not transact any securities related business directly whatsoever. This communication is the property of Strike and its affiliates, and does not constitute an offer to sell or the solicitation of an offer to buy any security in any jurisdiction. It is intended only for the person to whom it is addressed and may contain information that is privileged, confidential, or otherwise protected from disclosure. Distribution or copying of this communication, or the information contained herein, by anyone other than the intended recipient is prohibited. If you have received this communication in error, please immediately notify Strike at [email protected], and delete and destroy any copies hereof. ________________________________ CONFIDENTIALITY / PRIVILEGE NOTICE: This transmission and any attachments are intended solely for the addressee. This transmission is covered by the Electronic Communications Privacy Act, 18 U.S.C ''2510-2521. The information contained in this transmission is confidential in nature and protected from further use or disclosure under U.S. Pub. L. 106-102, 113 U.S. Stat. 1338 (1999), and may be subject to attorney-client or other legal privilege. Your use or disclosure of this information for any purpose other than that intended by its transmittal is strictly prohibited, and may subject you to fines and/or penalties under federal and state law. If you are not the intended recipient of this transmission, please DESTROY ALL COPIES RECEIVED and confirm destruction to the sender via return transmittal.
