Thanks Jarek and everyone involved, great step!

Shahar

On Sun, Feb 9, 2025, 11:40 Jarek Potiuk <ja...@potiuk.com> wrote:

> Hello,
>
> I have a pleasure to announce that the provider's move to the new structure
> is now complete. We have no more providers in "providers/src". Many, many
> thanks to all those who helped, truly collaborated and learned from each
> other during the migration process. This was quite a journey, where I
> set-off the migration, and went to Brussels for almost a week of conference
> where I had very little time, yet things were moving with the lightning
> speed with only a little of my help and encouragement.
>
> The Airflow community is the best!
>
> Thanks to - In no particular order:
>
>
> *Kalyan, Pratiksha, Elad, Rahul, Shubham, Kunal, Josix, Bugra, LIU ZHE YOU,
> Amogh, Aritra, Nikolas, got686-yandex, David Blain, Ambika, Idris, Ankit,
> Mikhail Dengin, Dennis, Jens *
> And anyone else who I missed. This has been fantastic teamwork :). So many
> people got involved and helped.
>
> *THANK YOU! *
>
> *What do we have now?*
>
> Each provider now has its own pyproject.toml file and is effectively a
> separate sub-project in our monorepo. There are few things it enables a few
> things:
>
> a) you can easily build each provider now with just `hatch build .` or
> `flit build .` or any other frontend - making the providers "modern
> standard PEP-compliant"
> b) you can install the "main" (or any other branch or commit) version of
> the provider using github URL. This for example allows for easy testing of
> not-yet-released providers: any of the "developer-focused" users who would
> like to use the "main" version with changes they introduced for example,
> could  install such pre-release providers in their environment very easily
> now.
> c) we can now start proceeding with next steps - making core truly
> independent from providers (there are still some references, tests and
> dependencies left) and proceed with further simplifying of our CI and
> turning all db-tests in providers into non-db tests (to make sure that they
> are not dependent on the DB while we switch to Task SDK) -  following steps
> 2-4 outlined in
> https://github.com/apache/airflow/issues/42632#issuecomment-2449671014
> d) removal of a lot of code that handled the old ways of doing things where
> sources of providers were shared with Airflow.
>
>
> *One watchout !!!!:  *
> Currently on MacOS you can hit `*too many open files*` errors when running
> `uv sync`. This issue is being worked on by Astral team  in
> https://github.com/astral-sh/uv/issues/11296  (they are happy to have
> airflow again stretching the limits of `uv` - as they wrote "airflow is
> their favourite benchmark and test case"). This is in essence caused by a
> very low limit set by default on the number of opened files by MacOS (256).
>
> It is easily mitigated by adding `*ulimit -n 2048*` in your .bashrc or
> .zshrc and we described it in the docs. but it would be nice to have it
> fixed in `uv` eventually and get `uv sync` works out-of-the-box for Airflow
> - I am quite sure that the Astral team will fix it soon. For now I added an
> explanation in
>
> https://github.com/apache/airflow/blob/main/contributing-docs/07_local_virtualenv.rst
> and will further clarify that it should be done in your .rc file to be
> persistent.
>
>
> *What's coming?*
> What's next is a cleanup. We still have quite a lot of duplicated code to
> remove, and few places where we still manually emulate `uv workspace`
> rather than use it.
>
>
> *Personal note*
> It's been quite a journey for me personally.
>
> Ash had always "complained" about the current setup and we both agreed that
> having a "proper" monorepo with separate sub-projects is a good thing to
> have. But the tooling was not there. The standards were not there for
> years. Python packaging PEPs implemented in the last few years and tooling
> improvements (notably `uv workspace` that I helped Charlie and Astral team
> to design to fit our case) had to catch-up, and the last few years Python
> packaging had improved immensely and it's picking up speed. I made my first
> POC to move the providers in December 2022:
> https://github.com/apache/airflow/pull/28292  and the first email on the
> devlist I sent about it was 12th December 2022:
> https://lists.apache.org/thread/3s5tn1wnvo0cw9vofwmbjl0rkyvhrtbx . But
> back
> then it would be far too complex for our contributors to use, without all
> the tooling support and standards.
>
> TP particularly, who is a packaging team committee has been driving a lot
> of those in the Packaging team and he deserves an absolute shout-out here.
> He is a bit of a silent hero who discusses and participates in many PEPs
> that we make use of.
>
> But even though it was me who mostly pushed and pulled many strings around
> it - and TP who was actively participating in the process - it was all
> community effort. We not only patiently waited for it but also actively
> helped to move the standards, encouraged them and helped others to
> implement features that we needed. So it's more than 2 years of intense
> work of packaging team, introduction of new tool (`uv`) in packaging space
> and us making incremental improvements, switching to modern PEP standards
> in December 2023 and many other small things that could be seen as "yacc
> beating" as some might call it, but eventually were needed those many
> smaller and bigger things to get here.
>
> *And the journey is absolutely not over:*
>
> I am also looking forward to what's coming and I am also planning to help
> in Python community and get involved (and help to shape) a few other things
> that are in progress that will (finally) catch-up with what Airflow needs
> are, so that we can finally get rid of even more custom code we have and
> improve both development and security of our processes and reflect more the
> way we (and the Apache Software Foundation works), I hope to have some more
> time after we complete the current packaging work to help with those - i
> promised it in a few of those, but I had to yet deliver my promise. And
> also anyone in the community here is welcome to help as well, as you see,
> it eventually pays off.
>
> * https://peps.python.org/pep-0751/ -> *A file format to record Python
> dependencies for installation reproducibility *-> this will finally codify
> what we do as a "poor man's" solution with constraints. I've been waiting
> for that one to be there for years, and there was a rejected version of it
> (TP participated in it) - but it looks like we are getting there to make it
> a "standard" that we - and tooling out there - will just be able to follow
> * https://peps.python.org/pep-0752/ ->  *Implicit namespaces for package
> repositories* -> will be helpful for naming of our packages in PyPI to be
> consistent and not hi-jacked
> * https://peps.python.org/pep-0770/ *-> Improving measurability of Python
> packages with Software Bill-of-Materials* - where we will be able to embed
> our SBOMS we already generate in PyPI metadata
> * https://peps.python.org/pep-0771/ -> *Default Extras for Python Software
> Packages* - which will allow us to get rid of our custom "preinstalled
> packages"
> * https://peps.python.org/pep-0735/ -> *Dependency Groups in
> pyproject.toml*
> - which we already partially use, but once `pip` releases it (already
> merged and planned to be released in 25.1 - will allow us to replace our
> `extras` with dependency groups for development
>
> ... and more to come ....
>
> All these things we need for our workflows and setup and so far we had to
> do some "custom" band-aid solutions, but the awesome packaging team is
> discussing and implementing things to make all those "first class citizens"
> in Python packaging and it will let us switch to those.
>
> Looking forward to all those improvements in the (near) future. Looks like
> the next few years will keep me (and others) busy with those.
>
> J.
>

Reply via email to