potiuk opened a new pull request, #36537:
URL: https://github.com/apache/airflow/pull/36537

   This PR changes Airflow installation and build backend to use new standard 
Python ways of building Python applications.
   
   We've been trying to do it for quite a while. Airflow tranditionally has 
been using complex and convoluted build process based on setuptools and 
(extremely) custom setup.py file. It survived migration to Airflow 2.0 and 
splitting Airlfow monorepo into Airflow and Providers, adding pre-installed 
providers and switching providers to use flit (and follow build standards).
   
   So far tooling in Python ecosystme had not been able to fuflill our needs 
and we refrained to develop our own tooling, but finally with appearance of 
Hatch (managed by Python Packaging Authority) and few recent advancements there 
we are finally able to swtich to Python standard ways of managing project 
dependnecy configuration and project build setup (with a few customizations).
   
   This PR makes airflow build process follow those standard PEPs:
   
   * Airflow has all build configuration stored in pyproject.toml following PEP 
518 which allows any fronted (`pip`, `poetry`, `hatch`, `flit`, or whatever 
other frontend is used to install required build dependendencies to install 
Airflow locally and to build distribution pacakges (sdist/wheel)
   
   * Hatchling backend follows PEP 517 for standard source tree and build 
backend implementation that allows to execute the build in a 
frontend-independent way
   
   * We store all project metadata in pyprooject.toml - following PEP 621 where 
all necessary project metadata components were defined.
   
   * We plug-in into Hatchling "editable build" hooks following PEP 660. 
Hatchling internally builds editable wheel that is used as ephemeral step and 
communication between backend and frontend (and this ephemeral wheel is used to 
make editable installation of the projeect - suitable for fast iteration of 
code without reinstalling the package.
   
   With Airflow having many provider packages in single source tree where we 
want to be able to install and develop airflow and providers together, this is 
not a small feat to implement the case wher editable installation has to behave 
quite a bit differently when it comes to packaging and dependencies for 
editable install (when you want to edit sources directly) and installable 
package (where you want to have separate Airflow package and provider 
packages). Fortunately the standardisation efforts in the Python Packaging 
community and tooling implementing it had finally made it possible.
   
   Some of the important ways bow this has been achieved:
   
   * Pyproject.toml is generally managed manually, but the part where provider 
dependencies and bundle dependencies are used is automatically updated by a 
pre-commit whenever provider dependencies change.
   
   * We have dedicated (generated) `[devel_provider_*]` extras that are only 
installing provider dependencies in editable mode (not the final provider 
packages). This allows to install dependencies of providers individually or in 
groups in the editable installation of Airflow, without installing provider 
packages (i.e. we can use provider code directly from sources of editable 
Airflow installation).
   
   * We have some generated `[devel_*]` bundle extras that bundle together all 
or selected provider dependencies for installation in CI image and local 
editable virtualenv installation.
   
   * We are utilising custom hatchiling build hooks (PEP 660 standard) that 
allow to modify 'standard' wheel package on-the-fly when the wheel is being 
prepared by adding preinstalled package dependencies (which are not needed in 
editable build) and by removing all devel extras (that are not needed in the 
PyPI distributed wheel package). This allows to solve the conundrum of having 
different "editable" and "standard" behaviour while keeping the same project 
specification in pyproject.toml.
   
   * We added description of how `Hatch` can be employed as build frontend in 
order to manage local virtualenv and install Airflow in editable way easily - 
while keeping all properties of the installed application (including working 
airflow cli and package metadata discovery) as well as how to use PEP-standard 
ways of bulding wheel and sdist packages.
   
   * We have a custom step (following PEP-standards) to inject airflow-specific 
build steps - compiling www assets and generating git commit hash version to 
display it in the UI
   
   * We also show how all this makes it possible to make it easy to manage 
local virtualenvs and editable installations for Airflow contributors - without 
vendor lock-in of the build tools as by following standard PEPs Airflow can be 
locally and editably installed by anyone using any build front-end tools 
following the standards - whether you use `pip`, `poetry`, `Hatch`, `flit` or 
any other frontent build tools, Airflow local installation and package building 
will work the same way for all of them, where both "editable" and "standard" 
package prepration is managed by `hatchling` backend in the same way.
   
   * Previously our extras contained a "." which is not normalized name for 
extras - `pip` and other tools replaced it automatically with `_'. This change 
updates the extra names to contain '_' rather than '.' in the name. This should 
be fully backwards compatible, users will still be able to use "." but it will 
be normalized to "_" in Airflow packages.
   
   * Some of the problematic extras (graphviz, docgen) have been moved out of 
the core extras to optional ones. Particularly graphviz has been difficult to 
install on MacOS ARM. This is slightly backwards incompatible, but we should 
treat it as bugfix - the only missing feature Airflow will not be able to 
handle is to produce DAG output as image (and it only requires to install 
graphviz to bring it back). The difficulty of installing graphviz as required 
dependency justifies the slight backwards-incompatible change.
   
   * Additionally, this change organizes the documentation around the extras 
and dependencies, explaining the reasoning behind all the different extras we 
have.
   
   * As a bonus (and this is what we used to test it all) we are documenting 
how to use Hatch frontend to:
   
     * manage multiple Python installations
     * manage multiple Pythob virtualenv environments
     * build Airflow packages for release management
   
   <!--
    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at
   
      http://www.apache.org/licenses/LICENSE-2.0
   
    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.
    -->
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   
   
   <!-- Please keep an empty line above the dashes. -->
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to