I personally think that this is a great idea. I have been following the
hatch project for a while and I am convinced it has a lot to offer for
airflow. The two big pros for me are its ease of use (backend and front
end) as well as the security covered aspects (reproducible builds to name
one).

I will take a look at the PR later this week, but it definitely sounds
exciting.



On Tue 2 Jan 2024 at 20:26, Jarek Potiuk <ja...@potiuk.com> wrote:

> Hello everyone.
>
> Tl;DR; I have a proposal to adopt Hatchling as a build backend (and
> recommend, but not require Hatch as frontend) for Airflow as our way
> of switching to PEP-standard compliant pyproject.toml way of
> installing Airflow (including local venvs) and building the Airflow
> package.
>
> I have a working implementation that needs polishing and taking a few
> less important decisions and rather simple TODOS). Here is draft PR:
> https://github.com/apache/airflow/pull/36537
>
> I've spent a better part of the Xmas/New Years break on implementing
> it - something that we've been discussing for - literally - years -
> and several people (including myself) made several attempts in the
> past  - unsuccessfully- with standardising python packaging/ build
> process for Airflow to use modern standard-driven tooling.
>
> I think I succeeded. finally.
>
> In short, what it means:
>
> When this change is merged, Airflow will have a nice and slick and
> modern, standard compliant contributor's experience - with editable
> installation that will **just work**, that will work with multiple
> build front-ends and it will make it very easy to install and manage
> local virtualenv(s) to contribute to Airflow. The extras structure and
> airflow configuration will be in one place (pyproject.toml) and it
> will be much easier to reason about our extras and dependencies. As a
> bonus point - with tools like Hatch, contributors will get the
> canonical way of managing local virtualenvs for Airflow development
> and a very easy recommended way to manage both Python and Venvs (but
> without forcing a single frontend).
>
> From the user perspective Airflow packages will be more standardised,
> with just user extras defined. From maintainers and PMC members, we
> will get reproducible builds (similarly as we have now for Providers)
> - which means that it will be easier and more robust to verify
> provenance of the packages (security!)
>
> Why can we do it now and we could not do it before ?
>
> This is mostly thanks to Herculean efforts of Python Packaging team
> (hats off to TP being part of the team and leading a lot of
> standardisation efforts there) - after a few years of relentless
> introduction and implementation of many PEPs and releasing new tooling
> (particularly Hatch, but also Flit that we already use for providers)
> it seems finally Airflow can move away from a very complex, completely
> custom setup.py and setup tools being abused by us in ways that
> authors and Packaging team did not originally anticipate.
>
> What problems does the change solve?
>
> My PR solves all the difficult requirements of our custom solution,
> but also (mostly thanks to standardisation efforts by the packaging
> team), it improves on a lot of problems we could not solve.
>
> Happy to have a detailed discussion here, and more detailed in the PR
> (I added a lot more context and documentation-  showing how this will
> work when we merge it). but here is the list of things such a move
> provides:
>
> * We are using hatchling build backend, that follows appropriate PEP
> standards and makes it work with any "frontend" you choose to install
> and manage your local installation (You can use modern Hatch which is
> counterpart to hatchling - highly recommended, but also it will work
> with just pip, poetry, flit, and any other standard-compliant tool in
> the future. No habits of the contributors need to be changed, it will
> **just** work
>
> * our editable installation has been broken for some time (mostly
> because we were abusing setuptools and setup.py A LOT). See
> https://github.com/apache/airflow/issues/30764 . This change puts the
> shine back on being able to make editable install of airflow work as
> expected and getting a first-class experience for contributors with
> local virtualenvs
>
> * all Airflow package configuration is now merged into a single
> appropriate PEP-compliant pyproject.toml - no more setup.py,
> setup.cfg, MANIFEST.in.
>
> * the extras are refactored and organized into logical groups and
> start to make sense. I introduced new "editable" extras to allow you
> to easily install provider dependencies locally and reorganized devel
> extras to make it easy to understand what you should install in your
> editable environment to run tests. More importantly those "devel"
> extras - while present in pyproject.toml are stripped off (thanks to
> custom hooks) from the final package - so final package has just
> things that are important to our users
>
> * we use pre-commit to automatically use provider.yaml dependencies
> and merge them into pyproject.toml - thanks to that provider.yaml will
> remain the single source of truth for providers. This provides a
> single source of truth for provider configuration, while it also
> allows one local installation to develop them all together" - and in a
> very seamless way.
>
> * no more INSTALL_PROVIDERS_FROM_SOURCES hack when you install airflow
> for local development. I figured a nice way to avoid installing
> pre-installed providers, and to make it super-easy to install
> dependencies of providers in editable installation (hint: `pip install
> -e .[editable_google]` . This thanks to custom build hooks the PEP
> standardized.
>
> * I also recommend Hatch as a Python/Venv management tool and used it
> for testing - it's a great tool for managing both - Python
> installations and Virtualenv management. For many people - providing
> such a canonical way (while following the standards and not forcing
> Hatch) will be really great to simplify their local environment
> installation.
>
> * Hatchling supports reproducible builds out-of-the-box, which is
> great for security - and it will make our package generation much
> safer and easier to verify (as we do with our providers now).
>
> There are many more details and thoughts (and also some future
> possible developments) that I am aware of, but this mail is already
> too long. and we can discuss it in the thread/PR or future threads.
>
> Happy to take any questions, critique, proposals and feedback - I got
> quite deep into how modern package building works so I likely made
> some mistakes / bad assumptions or things can be improved or maybe we
> can take other directions.  It will take some time to merge and
> discuss details, and if this one gets approved it's likely going to be
> targeted for Airflow 2.9.
>
> J.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to