Hey Everyone, I got the PR green: https://github.com/apache/airflow/pull/36537 - I got a really comprehensive review and a number of iterations with Jens (and approval! yay!!) and a number of comments from TP.
I would love to have some feedback from others before merging, I still want to (I will do it tomorrow) go through the packages prepared with hatch and make sure we have not lost (or added) too much from the packages and add appropriate inclusions/exclusions - but other than that, I think it could be merged even today. I'd love some more comments - especially from those who struggled with local venv/editable installation and dependency management/adding provider dependencies recently - as the way it is done now should be WAY simpler and better. Just to repeat what we get with that one: 1. cutting-edge support for packaging Python standards (see previous mail in the thread) - with complete configuration for project in single pyproject.toml file. Allows to use any modern build frontend for development (hatch, pip. poetry, pipenv etc.) 2. nicer integration with IDEs (Pycharm/VScode etc.) with installing dependency management 3. nicely and logically organized dependencies - including devel dependencies + extras per provider, nicely managed from provider.yaml 4. seamlessly working `pip install --editable .` (it was hacked before, and not working in recent `pip` versions - now it will `**just work**) 5. a way to easily install provider devel dependencies for testing in local venv (`pip install -e ".[amazon,google]"`) 6. hatch as recommended (but not mandatory) frontend that supports out-of-the-box: a) installing python interpreters (`hatch python install all`) b) creating local venvs (`hatch env create`, `hatch env shell`, `hatch -e airflow-311 create` and so on) c) building packages for release (`hatch build -c custom -c wheel -c sdist`) d) later we will use more things that hatch gives us (reproducible builds, publishing to PyPI, possibly local testing and code formatting, better monorepo organization in the future). 7. Updated documentation for all the above. Note: It does not replace Breeze for reproducing and optimizing our CI build (Breeze has way more optimizations and customisations needed for Airflow). However it makes the LOCAL_VIRTUALENV option of running tests and developing airflow much easier to manage and get it under control. Just as a teaser - here is the output of `hash env show`: ┏━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Name ┃ Type ┃ Features ┃ Description ┃ ┡━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ default │ virtual │ devel │ Default environment with Python 3.8 for maximum compatibility │ ├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤ │ airflow-38 │ virtual │ │ Environment with Python 3.8. No devel installed. │ ├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤ │ airflow-39 │ virtual │ │ Environment with Python 3.9. No devel installed. │ ├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤ │ airflow-310 │ virtual │ │ Environment with Python 3.10. No devel installed. │ ├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤ │ airflow-311 │ virtual │ │ Environment with Python 3.11. No devel installed │ └─────────────┴─────────┴──────────┴───────────────────────────────────────────────────────────────┘ J. On Sun, Jan 7, 2024 at 11:55 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Ah .. .And comparing to the original proposal I simplified it a LOT. > generally speaking for both contributor and user the way how you > install Airflow for installation and contribution is "standard" and > basically just "fixes" what has been broken - i.e. you just install it > as expected: > > * `pip install apache-airflow[google]` or `pip install .[google]` > will install airflow + google provider (user story) > * `pip install -e .[google]` will install airflow + all google > provider dependencies in editable mode - ready to run tests > > Plus Airflow follows all the PEP-standards so that it is compatible > with all the modern tooling for Python packaging. Here is the list of > PEP's that it makes airflow generally compatible with: > > * `PEP-440 Version Identification and Dependency Specification > <https://www.python.org/dev/peps/pep-0440/>`__ > * `PEP-517 A build-system independent format for source trees > <https://www.python.org/dev/peps/pep-0517/>`__ > * `PEP-518 Specifying Minimum Build System Requirements for Python > Projects <https://www.python.org/dev/peps/pep-0518/>`__ > * `PEP-561 Distributing and Packaging Type Information > <https://www.python.org/dev/peps/pep-0561/>`__ > * `PEP-621 Storing project metadata in pyproject.toml > <https://www.python.org/dev/peps/pep-0621/>`__ > * `PEP-685 Comparison of extra names for optional distribution > dependencies <https://www.python.org/dev/peps/pep-0622/>`__ > > J. > > > On Sun, Jan 7, 2024 at 11:27 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > Hello everyone, > > > > I iterated quite a bit on the PR and I think it's ready for an even > > more serious review: https://github.com/apache/airflow/pull/36537 . I > > solved all of the TODOs and teething problems and while it likely > > still has some tests to fix, all the build and packaging pieces, local > > installation and even developer/contributor documentation should be > > already in the state that is ready for serious scrutiny. Thanks to > > Jens and TP for the reviews so far - I addressed all of the comments > > already - and there are just 2 conversations left remaining. > > > > See the comment for status summary: > > https://github.com/apache/airflow/pull/36537#issuecomment-1880193452 > > > > BTW. I found it really useful to follow the "unresolved conversation" > > routine - it's really nice to see such things as a summary (see > > attachment) and be able to see that there are still 2 conversations to > > resolve. > > That's the in-progress experiment with conversations which I > > personally like a lot so far. It already saved me from merging a PR > > that still had things to resolve. > > > > J. > > > > On Thu, Jan 4, 2024 at 8:04 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > I slept over it a few nights and got away of it and I have an idea to > > > simplify it quite a bit - i.e. cut the number of extras by half and > > > virtually make 0 impact on current editable installation so you might > > > wnnt to hold on a bit with that (unless you want to see it changing :) > > > ) .. The whole concept won't change, I just realized that I do not > > > need to add new `editable_` extras to achieve the same effect. > > > > > > I will also attempt to split it a bit to make it easier to review. > > > > > > Hold tight :) - but also feel free to look and comment even now :) > > > > > > And yes. Exciting. It kept me awake a night or two where I could not > > > get to sleep until I finally got it working :D > > > > > > J > > > > > > On Thu, Jan 4, 2024 at 6:52 PM Pierre Jeambrun <pierrejb...@gmail.com> > wrote: > > > > > > > > I personally think that this is a great idea. I have been following > the > > > > hatch project for a while and I am convinced it has a lot to offer > for > > > > airflow. The two big pros for me are its ease of use (backend and > front > > > > end) as well as the security covered aspects (reproducible builds to > name > > > > one). > > > > > > > > I will take a look at the PR later this week, but it definitely > sounds > > > > exciting. > > > > > > > > > > > > > > > > On Tue 2 Jan 2024 at 20:26, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > > > Hello everyone. > > > > > > > > > > Tl;DR; I have a proposal to adopt Hatchling as a build backend (and > > > > > recommend, but not require Hatch as frontend) for Airflow as our > way > > > > > of switching to PEP-standard compliant pyproject.toml way of > > > > > installing Airflow (including local venvs) and building the Airflow > > > > > package. > > > > > > > > > > I have a working implementation that needs polishing and taking a > few > > > > > less important decisions and rather simple TODOS). Here is draft > PR: > > > > > https://github.com/apache/airflow/pull/36537 > > > > > > > > > > I've spent a better part of the Xmas/New Years break on > implementing > > > > > it - something that we've been discussing for - literally - years - > > > > > and several people (including myself) made several attempts in the > > > > > past - unsuccessfully- with standardising python packaging/ build > > > > > process for Airflow to use modern standard-driven tooling. > > > > > > > > > > I think I succeeded. finally. > > > > > > > > > > In short, what it means: > > > > > > > > > > When this change is merged, Airflow will have a nice and slick and > > > > > modern, standard compliant contributor's experience - with editable > > > > > installation that will **just work**, that will work with multiple > > > > > build front-ends and it will make it very easy to install and > manage > > > > > local virtualenv(s) to contribute to Airflow. The extras structure > and > > > > > airflow configuration will be in one place (pyproject.toml) and it > > > > > will be much easier to reason about our extras and dependencies. > As a > > > > > bonus point - with tools like Hatch, contributors will get the > > > > > canonical way of managing local virtualenvs for Airflow development > > > > > and a very easy recommended way to manage both Python and Venvs > (but > > > > > without forcing a single frontend). > > > > > > > > > > From the user perspective Airflow packages will be more > standardised, > > > > > with just user extras defined. From maintainers and PMC members, we > > > > > will get reproducible builds (similarly as we have now for > Providers) > > > > > - which means that it will be easier and more robust to verify > > > > > provenance of the packages (security!) > > > > > > > > > > Why can we do it now and we could not do it before ? > > > > > > > > > > This is mostly thanks to Herculean efforts of Python Packaging team > > > > > (hats off to TP being part of the team and leading a lot of > > > > > standardisation efforts there) - after a few years of relentless > > > > > introduction and implementation of many PEPs and releasing new > tooling > > > > > (particularly Hatch, but also Flit that we already use for > providers) > > > > > it seems finally Airflow can move away from a very complex, > completely > > > > > custom setup.py and setup tools being abused by us in ways that > > > > > authors and Packaging team did not originally anticipate. > > > > > > > > > > What problems does the change solve? > > > > > > > > > > My PR solves all the difficult requirements of our custom solution, > > > > > but also (mostly thanks to standardisation efforts by the packaging > > > > > team), it improves on a lot of problems we could not solve. > > > > > > > > > > Happy to have a detailed discussion here, and more detailed in the > PR > > > > > (I added a lot more context and documentation- showing how this > will > > > > > work when we merge it). but here is the list of things such a move > > > > > provides: > > > > > > > > > > * We are using hatchling build backend, that follows appropriate > PEP > > > > > standards and makes it work with any "frontend" you choose to > install > > > > > and manage your local installation (You can use modern Hatch which > is > > > > > counterpart to hatchling - highly recommended, but also it will > work > > > > > with just pip, poetry, flit, and any other standard-compliant tool > in > > > > > the future. No habits of the contributors need to be changed, it > will > > > > > **just** work > > > > > > > > > > * our editable installation has been broken for some time (mostly > > > > > because we were abusing setuptools and setup.py A LOT). See > > > > > https://github.com/apache/airflow/issues/30764 . This change puts > the > > > > > shine back on being able to make editable install of airflow work > as > > > > > expected and getting a first-class experience for contributors with > > > > > local virtualenvs > > > > > > > > > > * all Airflow package configuration is now merged into a single > > > > > appropriate PEP-compliant pyproject.toml - no more setup.py, > > > > > setup.cfg, MANIFEST.in. > > > > > > > > > > * the extras are refactored and organized into logical groups and > > > > > start to make sense. I introduced new "editable" extras to allow > you > > > > > to easily install provider dependencies locally and reorganized > devel > > > > > extras to make it easy to understand what you should install in > your > > > > > editable environment to run tests. More importantly those "devel" > > > > > extras - while present in pyproject.toml are stripped off (thanks > to > > > > > custom hooks) from the final package - so final package has just > > > > > things that are important to our users > > > > > > > > > > * we use pre-commit to automatically use provider.yaml dependencies > > > > > and merge them into pyproject.toml - thanks to that provider.yaml > will > > > > > remain the single source of truth for providers. This provides a > > > > > single source of truth for provider configuration, while it also > > > > > allows one local installation to develop them all together" - and > in a > > > > > very seamless way. > > > > > > > > > > * no more INSTALL_PROVIDERS_FROM_SOURCES hack when you install > airflow > > > > > for local development. I figured a nice way to avoid installing > > > > > pre-installed providers, and to make it super-easy to install > > > > > dependencies of providers in editable installation (hint: `pip > install > > > > > -e .[editable_google]` . This thanks to custom build hooks the PEP > > > > > standardized. > > > > > > > > > > * I also recommend Hatch as a Python/Venv management tool and used > it > > > > > for testing - it's a great tool for managing both - Python > > > > > installations and Virtualenv management. For many people - > providing > > > > > such a canonical way (while following the standards and not forcing > > > > > Hatch) will be really great to simplify their local environment > > > > > installation. > > > > > > > > > > * Hatchling supports reproducible builds out-of-the-box, which is > > > > > great for security - and it will make our package generation much > > > > > safer and easier to verify (as we do with our providers now). > > > > > > > > > > There are many more details and thoughts (and also some future > > > > > possible developments) that I am aware of, but this mail is already > > > > > too long. and we can discuss it in the thread/PR or future threads. > > > > > > > > > > Happy to take any questions, critique, proposals and feedback - I > got > > > > > quite deep into how modern package building works so I likely made > > > > > some mistakes / bad assumptions or things can be improved or maybe > we > > > > > can take other directions. It will take some time to merge and > > > > > discuss details, and if this one gets approved it's likely going > to be > > > > > targeted for Airflow 2.9. > > > > > > > > > > J. > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > > > >