Hello,

I have a pleasure to announce that the provider's move to the new structure
is now complete. We have no more providers in "providers/src". Many, many
thanks to all those who helped, truly collaborated and learned from each
other during the migration process. This was quite a journey, where I
set-off the migration, and went to Brussels for almost a week of conference
where I had very little time, yet things were moving with the lightning
speed with only a little of my help and encouragement.

The Airflow community is the best!

Thanks to - In no particular order:


*Kalyan, Pratiksha, Elad, Rahul, Shubham, Kunal, Josix, Bugra, LIU ZHE YOU,
Amogh, Aritra, Nikolas, got686-yandex, David Blain, Ambika, Idris, Ankit,
Mikhail Dengin, Dennis, Jens *
And anyone else who I missed. This has been fantastic teamwork :). So many
people got involved and helped.

*THANK YOU! *

*What do we have now?*

Each provider now has its own pyproject.toml file and is effectively a
separate sub-project in our monorepo. There are few things it enables a few
things:

a) you can easily build each provider now with just `hatch build .` or
`flit build .` or any other frontend - making the providers "modern
standard PEP-compliant"
b) you can install the "main" (or any other branch or commit) version of
the provider using github URL. This for example allows for easy testing of
not-yet-released providers: any of the "developer-focused" users who would
like to use the "main" version with changes they introduced for example,
could  install such pre-release providers in their environment very easily
now.
c) we can now start proceeding with next steps - making core truly
independent from providers (there are still some references, tests and
dependencies left) and proceed with further simplifying of our CI and
turning all db-tests in providers into non-db tests (to make sure that they
are not dependent on the DB while we switch to Task SDK) -  following steps
2-4 outlined in
https://github.com/apache/airflow/issues/42632#issuecomment-2449671014
d) removal of a lot of code that handled the old ways of doing things where
sources of providers were shared with Airflow.


*One watchout !!!!:  *
Currently on MacOS you can hit `*too many open files*` errors when running
`uv sync`. This issue is being worked on by Astral team  in
https://github.com/astral-sh/uv/issues/11296  (they are happy to have
airflow again stretching the limits of `uv` - as they wrote "airflow is
their favourite benchmark and test case"). This is in essence caused by a
very low limit set by default on the number of opened files by MacOS (256).

It is easily mitigated by adding `*ulimit -n 2048*` in your .bashrc or
.zshrc and we described it in the docs. but it would be nice to have it
fixed in `uv` eventually and get `uv sync` works out-of-the-box for Airflow
- I am quite sure that the Astral team will fix it soon. For now I added an
explanation in
https://github.com/apache/airflow/blob/main/contributing-docs/07_local_virtualenv.rst
and will further clarify that it should be done in your .rc file to be
persistent.


*What's coming?*
What's next is a cleanup. We still have quite a lot of duplicated code to
remove, and few places where we still manually emulate `uv workspace`
rather than use it.


*Personal note*
It's been quite a journey for me personally.

Ash had always "complained" about the current setup and we both agreed that
having a "proper" monorepo with separate sub-projects is a good thing to
have. But the tooling was not there. The standards were not there for
years. Python packaging PEPs implemented in the last few years and tooling
improvements (notably `uv workspace` that I helped Charlie and Astral team
to design to fit our case) had to catch-up, and the last few years Python
packaging had improved immensely and it's picking up speed. I made my first
POC to move the providers in December 2022:
https://github.com/apache/airflow/pull/28292  and the first email on the
devlist I sent about it was 12th December 2022:
https://lists.apache.org/thread/3s5tn1wnvo0cw9vofwmbjl0rkyvhrtbx . But back
then it would be far too complex for our contributors to use, without all
the tooling support and standards.

TP particularly, who is a packaging team committee has been driving a lot
of those in the Packaging team and he deserves an absolute shout-out here.
He is a bit of a silent hero who discusses and participates in many PEPs
that we make use of.

But even though it was me who mostly pushed and pulled many strings around
it - and TP who was actively participating in the process - it was all
community effort. We not only patiently waited for it but also actively
helped to move the standards, encouraged them and helped others to
implement features that we needed. So it's more than 2 years of intense
work of packaging team, introduction of new tool (`uv`) in packaging space
and us making incremental improvements, switching to modern PEP standards
in December 2023 and many other small things that could be seen as "yacc
beating" as some might call it, but eventually were needed those many
smaller and bigger things to get here.

*And the journey is absolutely not over:*

I am also looking forward to what's coming and I am also planning to help
in Python community and get involved (and help to shape) a few other things
that are in progress that will (finally) catch-up with what Airflow needs
are, so that we can finally get rid of even more custom code we have and
improve both development and security of our processes and reflect more the
way we (and the Apache Software Foundation works), I hope to have some more
time after we complete the current packaging work to help with those - i
promised it in a few of those, but I had to yet deliver my promise. And
also anyone in the community here is welcome to help as well, as you see,
it eventually pays off.

* https://peps.python.org/pep-0751/ -> *A file format to record Python
dependencies for installation reproducibility *-> this will finally codify
what we do as a "poor man's" solution with constraints. I've been waiting
for that one to be there for years, and there was a rejected version of it
(TP participated in it) - but it looks like we are getting there to make it
a "standard" that we - and tooling out there - will just be able to follow
* https://peps.python.org/pep-0752/ ->  *Implicit namespaces for package
repositories* -> will be helpful for naming of our packages in PyPI to be
consistent and not hi-jacked
* https://peps.python.org/pep-0770/ *-> Improving measurability of Python
packages with Software Bill-of-Materials* - where we will be able to embed
our SBOMS we already generate in PyPI metadata
* https://peps.python.org/pep-0771/ -> *Default Extras for Python Software
Packages* - which will allow us to get rid of our custom "preinstalled
packages"
* https://peps.python.org/pep-0735/ -> *Dependency Groups in pyproject.toml*
- which we already partially use, but once `pip` releases it (already
merged and planned to be released in 25.1 - will allow us to replace our
`extras` with dependency groups for development

... and more to come ....

All these things we need for our workflows and setup and so far we had to
do some "custom" band-aid solutions, but the awesome packaging team is
discussing and implementing things to make all those "first class citizens"
in Python packaging and it will let us switch to those.

Looking forward to all those improvements in the (near) future. Looks like
the next few years will keep me (and others) busy with those.

J.

Reply via email to