ashb commented on code in PR #58430: URL: https://github.com/apache/airflow/pull/58430#discussion_r2538486434
########## contributing-docs/13_airflow_dependencies_and_extras.rst: ########## @@ -18,6 +18,226 @@ Airflow dependencies ==================== +This document describes how we manage Apache Airflow dependencies, as well as how we make sure +that users can use Airflow both as an application and as a library when they are deploying +their own Airflow - using our constraints mechanism. + +Airflow ``pyproject.toml`` files and ``uv`` workspace +..................................................... + +Managing dependencies is an important part of maintaining Apache Airflow, we have more than 700 Review Comment: ```suggestion Managing dependencies is an important part of developing Apache Airflow, we have more than 700 ``` Maintaining Airflow is things you do with a running Airflow deployment. ########## contributing-docs/13_airflow_dependencies_and_extras.rst: ########## @@ -18,6 +18,226 @@ Airflow dependencies ==================== +This document describes how we manage Apache Airflow dependencies, as well as how we make sure +that users can use Airflow both as an application and as a library when they are deploying +their own Airflow - using our constraints mechanism. + +Airflow ``pyproject.toml`` files and ``uv`` workspace +..................................................... + +Managing dependencies is an important part of maintaining Apache Airflow, we have more than 700 +dependencies, and often when you add a new feature that requires a new dependency, you should +also update the dependency list. This also happens when you want to add a new tool that is used +for development, testing, or building the documentation and this document describes how to do it. + +When it comes to defining dependencies for any of the Apache Airflow distributions, we are following the +standard ``pyproject.toml`` format that has been defined in `PEP 518 <https://peps.python.org/pep-0518/>`_ +and `PEP 621 <https://peps.python.org/pep-0621/>`_, we are also using dependency groups defined in +`PEP 735 <https://peps.python.org/pep-0735/>`_ - particularly ``dev`` dependency group which we use in +all our ``pyproject.toml`` files to define development dependencies. + +We have more than 100 python distributions in Apache Airflow repository - including the main +``apache-airflow`` package, ``apache-airflow-providers-*`` packages, ``apache-airflow-core`` package, +``apache-airflow-task-sdk`` package, ``apache-airflow-ctl`` package, and several other packages. + +They are all connected together via ``uv`` workspace feature (workspace is defined in the root ``pyproject.toml`` +file of the repository in the ``apache-airflow`` distribution definition. The workspace feature allows us to +run ``uv sync`` at the top of the repository to install all packages in editable mode in the development +environment from all the 100+ distributions (!) and resolve the dependencies together, so that we know that +the dependencies have no conflicting requirements. Also the distributions are referring to each other via +name - which means that when you run locally ``uv sync``, the local version of the packages are used, not the +ones released on PyPI, which means that you can develop and test changes that span multiple packages at the +same time. This is a very powerful feature that allows us to maintain the complex ecosystem of Apache Airflow +distributions in a single monorepo, and allows us - for example to add new feature to common distributions used +by multiple providers and test them all together before releasing new versions of either of those pacckages + +Managing dependencies in ``pyproject.toml`` files +................................................. + +Each of the ``pyproject.toml`` files in Apache Airflow repository defines dependencies in one of the +following sections: + +* ``[project.dependencies]`` - this section defines the required dependencies of the package. These + dependencies are installed when you install the package without any extras. +* ``[project.optional-dependencies]`` - this section defines optional dependencies (extras) of the package. + These dependencies are installed when you install the package with extras - for example + ``pip install apache-airflow[ssh]`` will install the ``ssh`` extra dependencies defined in this section. +* ``[dependency-group.dev]`` - this section defines development dependencies of the package. + These dependencies are installed when you run ``uv sync`` by default. when ``uv`` syncs sources with + local pyproject.toml it adds ``dev`` dependency group and package is installed in editable mode with + development dependencies. + + +Adding and modifying dependencies +................................. + +Adding and modifying dependencies in Apache Airflow is done by modifying the appropriate +``pyproject.toml`` file in the appropriate distribution. + +When you add a new dependency, you should make sure that: + +* The dependency is added to the right section (main dependencies, optional dependencies, or + development dependencies) + +* Some parts of those dependencies might be automatically generated (and overwritten) by our ``prek`` + hooks. Those are the necessary dependencies that ``prek`` hoks can figure out automatically by + analyzing the imports in the sources and structure of the project. We also have special case of + shared dependencies (described in `shared dependencies document <../shared/README.md>`__) where we + effectively "static-link" some libraries into multiple distributions, to avoid unnecessary coupling and + circular dependencies, and those dependencies contribute some dependencies automatically as well. + Pay attention to comments such us example start and end of such generated block below. + All the dependencies between those comments are automatically generated and you should not modify + them manually. Tou might instead modify the root source of those dependencies - depending on the automation, + you can usually also add extra dependencies manually outside of those comment blocks + + .. code-block:: python + + # Automatically generated airflow optional dependencies (update_airflow_pyproject_toml.py) + # .... + # LOTS OF GENERATED DEPENDENCIES HERE + # .... + # End of automatically generated airflow optional dependencies + +* The version specifier is as open as possible (upper) while still allowing the package to install + and pass all tests. We very rarely upper-bind dependencies - only when there is a known + conflict with a new or upcoming version of a dependency that breaks the installation or tests + (and we always make a comment why we are upper-bounding a dependency). + +* Make sure to lower-bind any dependency you add. Usually we lower-bind dependencies to the + minimum version that is required for the package to work but in order to simplify the work of + resolvers such as ``pip``, we often lower-bind to higher (and newer) version than the absolute minimum + especially when the minimum version is very old. This is for example good practice in ``boto`` and related + packages, where new version of those packages are released frequently (almost daily) and there are many + versions that need to be considered by the resolver if the version is not new enough. + +* Make sure to run ``uv sync`` after modifying dependencies to make sure that there are no + conflicts between dependencies of different packages in the workspace. You can run it in multiple + ways - either from the root of the repository (which will sync all packages) or from the package + you modified (which will sync only that package and its dependencies). Also good idea might be to + run ``uv sync --all-packages --all-extras`` at the root of the repository to make sure that + all packages with all extras can be installed together without conflicts, but this might be sometimes + difficult and slow as some of the extras require some additional system level dependencies to be installed + (for example ``mysql`` or ``postgres`` extras require client libraries to be installed on the system). + +* Make sure to run all tests after modifying dependencies to make sure that nothing is broken + + +Referring to other Apache Airflow distributions in dependencies +............................................................... + +With having more than 100 distributions in the repository, it is often necessary to refer to +other distributions in the dependencies in order to use some common features or simply to use the +features that the other distribution provides. There are two ways of doing it: + +* Regular package linking with ``apache-airflow-*`` dependency +* Airflow "shared dependencies" mechanism - which is a bit of a custom hack for Airflow monorepo + that allows us to "static-link" some common dependencies into multiple distributions without + creating circular dependencies. + +We are not going to describe the shared dependencies mechanism here, please refer to the +`shared dependencies document <../shared/README.md>`__ for details, but there are certain rules +when it comes to referring to other Airflow distributions in dependencies - here are the important +rules to remember: + +* You can refer to other distributions in your dependencies - as usual using distribution name. For example, + if you are adding a dependency to ``apache-airflow-providers-common-compat`` package from + ``apache-airflow-providers-google``, you can just add ``apache-airflow-providers-common>=x.y.z`` to the + dependencies and when you run ``uv sync``, the local version of the package will be used automatically + (this is thanks to the workspace feature of ``uv`` that does great job of binding our monorepo together). + This is very useful when you are developing changes to multiple packages at the same time. Some of those + are added automatically by prek hooks - when it can detect such dependencies by analyzing imports in the + sources - then they are added automatically between the special comments mentioned above, but sometimes + (especially when such dependencies are not at the top-level imports) you might need to add them manually. Review Comment: You have already mentioned the developing changes locally above. Adding it just makes the point harder to follow. ########## contributing-docs/13_airflow_dependencies_and_extras.rst: ########## @@ -18,6 +18,226 @@ Airflow dependencies ==================== +This document describes how we manage Apache Airflow dependencies, as well as how we make sure +that users can use Airflow both as an application and as a library when they are deploying +their own Airflow - using our constraints mechanism. + +Airflow ``pyproject.toml`` files and ``uv`` workspace +..................................................... + +Managing dependencies is an important part of maintaining Apache Airflow, we have more than 700 +dependencies, and often when you add a new feature that requires a new dependency, you should +also update the dependency list. This also happens when you want to add a new tool that is used +for development, testing, or building the documentation and this document describes how to do it. + +When it comes to defining dependencies for any of the Apache Airflow distributions, we are following the +standard ``pyproject.toml`` format that has been defined in `PEP 518 <https://peps.python.org/pep-0518/>`_ +and `PEP 621 <https://peps.python.org/pep-0621/>`_, we are also using dependency groups defined in +`PEP 735 <https://peps.python.org/pep-0735/>`_ - particularly ``dev`` dependency group which we use in +all our ``pyproject.toml`` files to define development dependencies. + +We have more than 100 python distributions in Apache Airflow repository - including the main +``apache-airflow`` package, ``apache-airflow-providers-*`` packages, ``apache-airflow-core`` package, +``apache-airflow-task-sdk`` package, ``apache-airflow-ctl`` package, and several other packages. + +They are all connected together via ``uv`` workspace feature (workspace is defined in the root ``pyproject.toml`` +file of the repository in the ``apache-airflow`` distribution definition. The workspace feature allows us to +run ``uv sync`` at the top of the repository to install all packages in editable mode in the development +environment from all the 100+ distributions (!) and resolve the dependencies together, so that we know that +the dependencies have no conflicting requirements. Also the distributions are referring to each other via +name - which means that when you run locally ``uv sync``, the local version of the packages are used, not the +ones released on PyPI, which means that you can develop and test changes that span multiple packages at the +same time. This is a very powerful feature that allows us to maintain the complex ecosystem of Apache Airflow +distributions in a single monorepo, and allows us - for example to add new feature to common distributions used +by multiple providers and test them all together before releasing new versions of either of those pacckages + +Managing dependencies in ``pyproject.toml`` files +................................................. + +Each of the ``pyproject.toml`` files in Apache Airflow repository defines dependencies in one of the +following sections: + +* ``[project.dependencies]`` - this section defines the required dependencies of the package. These + dependencies are installed when you install the package without any extras. +* ``[project.optional-dependencies]`` - this section defines optional dependencies (extras) of the package. + These dependencies are installed when you install the package with extras - for example + ``pip install apache-airflow[ssh]`` will install the ``ssh`` extra dependencies defined in this section. +* ``[dependency-group.dev]`` - this section defines development dependencies of the package. + These dependencies are installed when you run ``uv sync`` by default. when ``uv`` syncs sources with + local pyproject.toml it adds ``dev`` dependency group and package is installed in editable mode with + development dependencies. + + +Adding and modifying dependencies +................................. + +Adding and modifying dependencies in Apache Airflow is done by modifying the appropriate +``pyproject.toml`` file in the appropriate distribution. + +When you add a new dependency, you should make sure that: + +* The dependency is added to the right section (main dependencies, optional dependencies, or + development dependencies) + +* Some parts of those dependencies might be automatically generated (and overwritten) by our ``prek`` + hooks. Those are the necessary dependencies that ``prek`` hoks can figure out automatically by + analyzing the imports in the sources and structure of the project. We also have special case of + shared dependencies (described in `shared dependencies document <../shared/README.md>`__) where we + effectively "static-link" some libraries into multiple distributions, to avoid unnecessary coupling and + circular dependencies, and those dependencies contribute some dependencies automatically as well. + Pay attention to comments such us example start and end of such generated block below. + All the dependencies between those comments are automatically generated and you should not modify + them manually. Tou might instead modify the root source of those dependencies - depending on the automation, Review Comment: ```suggestion them manually. You might instead modify the root source of those dependencies - depending on the automation, ``` ########## contributing-docs/13_airflow_dependencies_and_extras.rst: ########## @@ -18,6 +18,226 @@ Airflow dependencies ==================== +This document describes how we manage Apache Airflow dependencies, as well as how we make sure +that users can use Airflow both as an application and as a library when they are deploying +their own Airflow - using our constraints mechanism. + +Airflow ``pyproject.toml`` files and ``uv`` workspace +..................................................... + +Managing dependencies is an important part of maintaining Apache Airflow, we have more than 700 +dependencies, and often when you add a new feature that requires a new dependency, you should +also update the dependency list. This also happens when you want to add a new tool that is used +for development, testing, or building the documentation and this document describes how to do it. + +When it comes to defining dependencies for any of the Apache Airflow distributions, we are following the +standard ``pyproject.toml`` format that has been defined in `PEP 518 <https://peps.python.org/pep-0518/>`_ +and `PEP 621 <https://peps.python.org/pep-0621/>`_, we are also using dependency groups defined in +`PEP 735 <https://peps.python.org/pep-0735/>`_ - particularly ``dev`` dependency group which we use in +all our ``pyproject.toml`` files to define development dependencies. + +We have more than 100 python distributions in Apache Airflow repository - including the main +``apache-airflow`` package, ``apache-airflow-providers-*`` packages, ``apache-airflow-core`` package, +``apache-airflow-task-sdk`` package, ``apache-airflow-ctl`` package, and several other packages. + +They are all connected together via ``uv`` workspace feature (workspace is defined in the root ``pyproject.toml`` +file of the repository in the ``apache-airflow`` distribution definition. The workspace feature allows us to +run ``uv sync`` at the top of the repository to install all packages in editable mode in the development +environment from all the 100+ distributions (!) and resolve the dependencies together, so that we know that +the dependencies have no conflicting requirements. Also the distributions are referring to each other via +name - which means that when you run locally ``uv sync``, the local version of the packages are used, not the +ones released on PyPI, which means that you can develop and test changes that span multiple packages at the +same time. This is a very powerful feature that allows us to maintain the complex ecosystem of Apache Airflow +distributions in a single monorepo, and allows us - for example to add new feature to common distributions used +by multiple providers and test them all together before releasing new versions of either of those pacckages + +Managing dependencies in ``pyproject.toml`` files +................................................. + +Each of the ``pyproject.toml`` files in Apache Airflow repository defines dependencies in one of the +following sections: + +* ``[project.dependencies]`` - this section defines the required dependencies of the package. These + dependencies are installed when you install the package without any extras. +* ``[project.optional-dependencies]`` - this section defines optional dependencies (extras) of the package. + These dependencies are installed when you install the package with extras - for example + ``pip install apache-airflow[ssh]`` will install the ``ssh`` extra dependencies defined in this section. +* ``[dependency-group.dev]`` - this section defines development dependencies of the package. + These dependencies are installed when you run ``uv sync`` by default. when ``uv`` syncs sources with + local pyproject.toml it adds ``dev`` dependency group and package is installed in editable mode with + development dependencies. + + +Adding and modifying dependencies +................................. + +Adding and modifying dependencies in Apache Airflow is done by modifying the appropriate +``pyproject.toml`` file in the appropriate distribution. + +When you add a new dependency, you should make sure that: + +* The dependency is added to the right section (main dependencies, optional dependencies, or + development dependencies) + +* Some parts of those dependencies might be automatically generated (and overwritten) by our ``prek`` + hooks. Those are the necessary dependencies that ``prek`` hoks can figure out automatically by + analyzing the imports in the sources and structure of the project. We also have special case of + shared dependencies (described in `shared dependencies document <../shared/README.md>`__) where we + effectively "static-link" some libraries into multiple distributions, to avoid unnecessary coupling and + circular dependencies, and those dependencies contribute some dependencies automatically as well. + Pay attention to comments such us example start and end of such generated block below. + All the dependencies between those comments are automatically generated and you should not modify + them manually. Tou might instead modify the root source of those dependencies - depending on the automation, + you can usually also add extra dependencies manually outside of those comment blocks + + .. code-block:: python + + # Automatically generated airflow optional dependencies (update_airflow_pyproject_toml.py) + # .... + # LOTS OF GENERATED DEPENDENCIES HERE + # .... + # End of automatically generated airflow optional dependencies + +* The version specifier is as open as possible (upper) while still allowing the package to install + and pass all tests. We very rarely upper-bind dependencies - only when there is a known + conflict with a new or upcoming version of a dependency that breaks the installation or tests + (and we always make a comment why we are upper-bounding a dependency). + +* Make sure to lower-bind any dependency you add. Usually we lower-bind dependencies to the + minimum version that is required for the package to work but in order to simplify the work of + resolvers such as ``pip``, we often lower-bind to higher (and newer) version than the absolute minimum + especially when the minimum version is very old. This is for example good practice in ``boto`` and related + packages, where new version of those packages are released frequently (almost daily) and there are many + versions that need to be considered by the resolver if the version is not new enough. + +* Make sure to run ``uv sync`` after modifying dependencies to make sure that there are no + conflicts between dependencies of different packages in the workspace. You can run it in multiple + ways - either from the root of the repository (which will sync all packages) or from the package + you modified (which will sync only that package and its dependencies). Also good idea might be to + run ``uv sync --all-packages --all-extras`` at the root of the repository to make sure that + all packages with all extras can be installed together without conflicts, but this might be sometimes + difficult and slow as some of the extras require some additional system level dependencies to be installed + (for example ``mysql`` or ``postgres`` extras require client libraries to be installed on the system). + +* Make sure to run all tests after modifying dependencies to make sure that nothing is broken + + +Referring to other Apache Airflow distributions in dependencies +............................................................... + +With having more than 100 distributions in the repository, it is often necessary to refer to +other distributions in the dependencies in order to use some common features or simply to use the +features that the other distribution provides. There are two ways of doing it: + +* Regular package linking with ``apache-airflow-*`` dependency +* Airflow "shared dependencies" mechanism - which is a bit of a custom hack for Airflow monorepo + that allows us to "static-link" some common dependencies into multiple distributions without + creating circular dependencies. + +We are not going to describe the shared dependencies mechanism here, please refer to the +`shared dependencies document <../shared/README.md>`__ for details, but there are certain rules +when it comes to referring to other Airflow distributions in dependencies - here are the important +rules to remember: + +* You can refer to other distributions in your dependencies - as usual using distribution name. For example, + if you are adding a dependency to ``apache-airflow-providers-common-compat`` package from + ``apache-airflow-providers-google``, you can just add ``apache-airflow-providers-common>=x.y.z`` to the + dependencies and when you run ``uv sync``, the local version of the package will be used automatically + (this is thanks to the workspace feature of ``uv`` that does great job of binding our monorepo together). + This is very useful when you are developing changes to multiple packages at the same time. Some of those + are added automatically by prek hooks - when it can detect such dependencies by analyzing imports in the + sources - then they are added automatically between the special comments mentioned above, but sometimes + (especially when such dependencies are not at the top-level imports) you might need to add them manually. + +* Never update distribution versions on your own. It is **entirely** up to the Release Manager to bump the + version of distributions that are defined as ``project.version``. This goes almost without an exception + and any diversions from this rule should be discussed at ``#release-management`` channel in Airflow Slack + beforehand. The only exception to this rule is when you are adding a new distribution to the repository - + in that case you should set the initial version of the distribution - usually ``0.0.1``, ``0.1.0`` + or ``1.0.0`` depending on the maturity of the package. But still you should discuss it in the channel. + +* Sometimes, when you add a new feature to a common distribution, you might add a feature to it or + change the API in the way that other packages can use it. This is especially true for common packages such as + ``apache-airflow-providers-common-compat``, but can happen for other packages (for example + ``apache-airflow-providers-apache-beam`` is used by ``apache-airflow-providers-google`` to use ``Apache Beam`` + hooks to communicate with Google Dataflow). In such case, when you are adding a feature to a common package + remember that the feature you just add. will only be released in the **FUTURE** release of such common + package and you cannot add ``>==x.y.z`` dependency to it where ``x.y.z`` is the version you are + going to release in the future. Ultimately, this should happen (and happens) when the Release Manager prepares + both packages together. Let us repeat - such changes in versions between different airflow package should + **NOT** be added to the dependencies manually by the contributor. They should **exclusively** be added by + the Release Manager. when preparing the release of **both** packages together. + We have a custom mechanism to support such additions, where it is contributor's responsibility to mark + dependency with a special comment - simply communicating with the Release Manager that such dependency + should be updated to the next version when the release is prepared. If you see such a need to use newly + added feature and using it at the same time in a different distribution - make sure to add this comment + in the line where dependency you want to use the new feature from is defined: + + .. code-block:: python + + # some deps + "apache-airflow-SOMETHING>1.0.0", # use next version Review Comment: Is `use next version` an instruction to a human, or a precise string that CI picks up and must match? ########## contributing-docs/13_airflow_dependencies_and_extras.rst: ########## @@ -18,6 +18,226 @@ Airflow dependencies ==================== +This document describes how we manage Apache Airflow dependencies, as well as how we make sure +that users can use Airflow both as an application and as a library when they are deploying +their own Airflow - using our constraints mechanism. + +Airflow ``pyproject.toml`` files and ``uv`` workspace +..................................................... + +Managing dependencies is an important part of maintaining Apache Airflow, we have more than 700 +dependencies, and often when you add a new feature that requires a new dependency, you should +also update the dependency list. This also happens when you want to add a new tool that is used +for development, testing, or building the documentation and this document describes how to do it. + +When it comes to defining dependencies for any of the Apache Airflow distributions, we are following the +standard ``pyproject.toml`` format that has been defined in `PEP 518 <https://peps.python.org/pep-0518/>`_ +and `PEP 621 <https://peps.python.org/pep-0621/>`_, we are also using dependency groups defined in +`PEP 735 <https://peps.python.org/pep-0735/>`_ - particularly ``dev`` dependency group which we use in +all our ``pyproject.toml`` files to define development dependencies. + +We have more than 100 python distributions in Apache Airflow repository - including the main +``apache-airflow`` package, ``apache-airflow-providers-*`` packages, ``apache-airflow-core`` package, +``apache-airflow-task-sdk`` package, ``apache-airflow-ctl`` package, and several other packages. + +They are all connected together via ``uv`` workspace feature (workspace is defined in the root ``pyproject.toml`` +file of the repository in the ``apache-airflow`` distribution definition. The workspace feature allows us to +run ``uv sync`` at the top of the repository to install all packages in editable mode in the development +environment from all the 100+ distributions (!) and resolve the dependencies together, so that we know that +the dependencies have no conflicting requirements. Also the distributions are referring to each other via +name - which means that when you run locally ``uv sync``, the local version of the packages are used, not the +ones released on PyPI, which means that you can develop and test changes that span multiple packages at the +same time. This is a very powerful feature that allows us to maintain the complex ecosystem of Apache Airflow +distributions in a single monorepo, and allows us - for example to add new feature to common distributions used +by multiple providers and test them all together before releasing new versions of either of those pacckages + +Managing dependencies in ``pyproject.toml`` files +................................................. + +Each of the ``pyproject.toml`` files in Apache Airflow repository defines dependencies in one of the +following sections: + +* ``[project.dependencies]`` - this section defines the required dependencies of the package. These + dependencies are installed when you install the package without any extras. +* ``[project.optional-dependencies]`` - this section defines optional dependencies (extras) of the package. + These dependencies are installed when you install the package with extras - for example + ``pip install apache-airflow[ssh]`` will install the ``ssh`` extra dependencies defined in this section. +* ``[dependency-group.dev]`` - this section defines development dependencies of the package. + These dependencies are installed when you run ``uv sync`` by default. when ``uv`` syncs sources with + local pyproject.toml it adds ``dev`` dependency group and package is installed in editable mode with + development dependencies. + + +Adding and modifying dependencies +................................. + +Adding and modifying dependencies in Apache Airflow is done by modifying the appropriate +``pyproject.toml`` file in the appropriate distribution. + +When you add a new dependency, you should make sure that: + +* The dependency is added to the right section (main dependencies, optional dependencies, or + development dependencies) + +* Some parts of those dependencies might be automatically generated (and overwritten) by our ``prek`` + hooks. Those are the necessary dependencies that ``prek`` hoks can figure out automatically by + analyzing the imports in the sources and structure of the project. We also have special case of + shared dependencies (described in `shared dependencies document <../shared/README.md>`__) where we + effectively "static-link" some libraries into multiple distributions, to avoid unnecessary coupling and + circular dependencies, and those dependencies contribute some dependencies automatically as well. + Pay attention to comments such us example start and end of such generated block below. + All the dependencies between those comments are automatically generated and you should not modify + them manually. Tou might instead modify the root source of those dependencies - depending on the automation, + you can usually also add extra dependencies manually outside of those comment blocks + + .. code-block:: python + + # Automatically generated airflow optional dependencies (update_airflow_pyproject_toml.py) + # .... + # LOTS OF GENERATED DEPENDENCIES HERE + # .... + # End of automatically generated airflow optional dependencies + +* The version specifier is as open as possible (upper) while still allowing the package to install + and pass all tests. We very rarely upper-bind dependencies - only when there is a known + conflict with a new or upcoming version of a dependency that breaks the installation or tests + (and we always make a comment why we are upper-bounding a dependency). + +* Make sure to lower-bind any dependency you add. Usually we lower-bind dependencies to the + minimum version that is required for the package to work but in order to simplify the work of + resolvers such as ``pip``, we often lower-bind to higher (and newer) version than the absolute minimum + especially when the minimum version is very old. This is for example good practice in ``boto`` and related + packages, where new version of those packages are released frequently (almost daily) and there are many + versions that need to be considered by the resolver if the version is not new enough. + +* Make sure to run ``uv sync`` after modifying dependencies to make sure that there are no + conflicts between dependencies of different packages in the workspace. You can run it in multiple + ways - either from the root of the repository (which will sync all packages) or from the package + you modified (which will sync only that package and its dependencies). Also good idea might be to + run ``uv sync --all-packages --all-extras`` at the root of the repository to make sure that + all packages with all extras can be installed together without conflicts, but this might be sometimes + difficult and slow as some of the extras require some additional system level dependencies to be installed + (for example ``mysql`` or ``postgres`` extras require client libraries to be installed on the system). + +* Make sure to run all tests after modifying dependencies to make sure that nothing is broken + + +Referring to other Apache Airflow distributions in dependencies +............................................................... + +With having more than 100 distributions in the repository, it is often necessary to refer to +other distributions in the dependencies in order to use some common features or simply to use the +features that the other distribution provides. There are two ways of doing it: + +* Regular package linking with ``apache-airflow-*`` dependency +* Airflow "shared dependencies" mechanism - which is a bit of a custom hack for Airflow monorepo + that allows us to "static-link" some common dependencies into multiple distributions without + creating circular dependencies. + +We are not going to describe the shared dependencies mechanism here, please refer to the +`shared dependencies document <../shared/README.md>`__ for details, but there are certain rules +when it comes to referring to other Airflow distributions in dependencies - here are the important +rules to remember: + +* You can refer to other distributions in your dependencies - as usual using distribution name. For example, + if you are adding a dependency to ``apache-airflow-providers-common-compat`` package from + ``apache-airflow-providers-google``, you can just add ``apache-airflow-providers-common>=x.y.z`` to the + dependencies and when you run ``uv sync``, the local version of the package will be used automatically + (this is thanks to the workspace feature of ``uv`` that does great job of binding our monorepo together). + This is very useful when you are developing changes to multiple packages at the same time. Some of those + are added automatically by prek hooks - when it can detect such dependencies by analyzing imports in the + sources - then they are added automatically between the special comments mentioned above, but sometimes + (especially when such dependencies are not at the top-level imports) you might need to add them manually. + +* Never update distribution versions on your own. It is **entirely** up to the Release Manager to bump the + version of distributions that are defined as ``project.version``. This goes almost without an exception + and any diversions from this rule should be discussed at ``#release-management`` channel in Airflow Slack + beforehand. The only exception to this rule is when you are adding a new distribution to the repository - + in that case you should set the initial version of the distribution - usually ``0.0.1``, ``0.1.0`` + or ``1.0.0`` depending on the maturity of the package. But still you should discuss it in the channel. + +* Sometimes, when you add a new feature to a common distribution, you might add a feature to it or + change the API in the way that other packages can use it. This is especially true for common packages such as + ``apache-airflow-providers-common-compat``, but can happen for other packages (for example + ``apache-airflow-providers-apache-beam`` is used by ``apache-airflow-providers-google`` to use ``Apache Beam`` + hooks to communicate with Google Dataflow). In such case, when you are adding a feature to a common package + remember that the feature you just add. will only be released in the **FUTURE** release of such common + package and you cannot add ``>==x.y.z`` dependency to it where ``x.y.z`` is the version you are + going to release in the future. Ultimately, this should happen (and happens) when the Release Manager prepares + both packages together. Let us repeat - such changes in versions between different airflow package should + **NOT** be added to the dependencies manually by the contributor. They should **exclusively** be added by + the Release Manager. when preparing the release of **both** packages together. + We have a custom mechanism to support such additions, where it is contributor's responsibility to mark + dependency with a special comment - simply communicating with the Release Manager that such dependency + should be updated to the next version when the release is prepared. If you see such a need to use newly + added feature and using it at the same time in a different distribution - make sure to add this comment + in the line where dependency you want to use the new feature from is defined: + + .. code-block:: python + + # some deps Review Comment: At first I thought "some deps" was an important comment. ```suggestion # ... # other deps here # ... ``` ########## dev/breeze/tests/test_selective_checks.py: ########## @@ -2779,3 +2779,169 @@ def test_testable_providers_integrations_excludes_arm_disabled_on_arm(): assert "postgres" in result assert "trino" not in result assert "ydb" not in result + + +@patch("airflow_breeze.utils.selective_checks.run_command") +def test_provider_dependency_bump_check_no_changes(mock_run_command): + """Test that provider dependency bump check passes when no pyproject.toml files are changed.""" + selective_checks = SelectiveChecks( + files=("some_other_file.py",), + commit_ref=NEUTRAL_COMMIT, + pr_labels=(), + github_event=GithubEvents.PULL_REQUEST, + default_branch="main", + ) + result = selective_checks.provider_dependency_bump + assert result is False + + +@patch("airflow_breeze.utils.selective_checks.run_command") +def test_provider_dependency_bump_check_fails_on_provider_version_bump(mock_run_command): + """Test that provider dependency bump check fails when provider version is bumped without label.""" + old_toml = """ +[project] +dependencies = [ + "apache-airflow-providers-common-sql>=1.0.0", +] +""" + new_toml = """ +[project] +dependencies = [ + "apache-airflow-providers-common-sql>=1.1.0", +] +""" + + def side_effect(*args, **kwargs): + result = Mock() + result.returncode = 0 + if "^:" in args[0][2]: + result.stdout = old_toml + else: + result.stdout = new_toml + return result + + mock_run_command.side_effect = side_effect + + with pytest.raises(SystemExit): + SelectiveChecks( + files=("providers/amazon/pyproject.toml",), + commit_ref=NEUTRAL_COMMIT, + pr_labels=(), + github_event=GithubEvents.PULL_REQUEST, + default_branch="main", + ).provider_dependency_bump + + +@patch("airflow_breeze.utils.selective_checks.run_command") +def test_provider_dependency_bump_check_passes_with_label(mock_run_command): + """Test that provider dependency bump check passes when label is set.""" + old_toml = """ +[project] +dependencies = [ + "apache-airflow-providers-common-sql>=1.0.0", +] +""" + new_toml = """ +[project] +dependencies = [ + "apache-airflow-providers-common-sql>=1.1.0", +] +""" + + def side_effect(*args, **kwargs): + result = Mock() + result.returncode = 0 + if "^:" in args[0][2]: + result.stdout = old_toml + else: + result.stdout = new_toml + return result + + mock_run_command.side_effect = side_effect + + selective_checks = SelectiveChecks( + files=("providers/amazon/pyproject.toml",), + commit_ref=NEUTRAL_COMMIT, + pr_labels=("allow provider dependency bump",), + github_event=GithubEvents.PULL_REQUEST, + default_branch="main", + ) + result = selective_checks.provider_dependency_bump + assert result is True + + +@patch("airflow_breeze.utils.selective_checks.run_command") +def test_provider_dependency_bump_check_passes_on_non_provider_dependency_changes(mock_run_command): + """Test that provider dependency bump check passes when non-provider dependencies change.""" + old_toml = """ +[project] +dependencies = [ + "apache-airflow>=2.10.0", + "boto3>=1.37.0", +] +""" + new_toml = """ +[project] +dependencies = [ + "apache-airflow>=2.10.0", + "boto3>=1.38.0", +] +""" + + def side_effect(*args, **kwargs): + result = Mock() + result.returncode = 0 + if "^:" in args[0][2]: + result.stdout = old_toml + else: + result.stdout = new_toml + return result + + mock_run_command.side_effect = side_effect + + selective_checks = SelectiveChecks( + files=("providers/amazon/pyproject.toml",), + commit_ref=NEUTRAL_COMMIT, + pr_labels=(), + github_event=GithubEvents.PULL_REQUEST, + default_branch="main", + ) + result = selective_checks.provider_dependency_bump + assert result is False + + +@patch("airflow_breeze.utils.selective_checks.run_command") +def test_provider_dependency_bump_check_in_optional_dependencies(mock_run_command): Review Comment: Why are optional deps exempt from the check? If the goal is to stop things being used before they are released, then wouldn't these be equally as prone to version issues? ########## contributing-docs/13_airflow_dependencies_and_extras.rst: ########## @@ -18,6 +18,226 @@ Airflow dependencies ==================== +This document describes how we manage Apache Airflow dependencies, as well as how we make sure +that users can use Airflow both as an application and as a library when they are deploying +their own Airflow - using our constraints mechanism. + +Airflow ``pyproject.toml`` files and ``uv`` workspace +..................................................... + +Managing dependencies is an important part of maintaining Apache Airflow, we have more than 700 +dependencies, and often when you add a new feature that requires a new dependency, you should +also update the dependency list. This also happens when you want to add a new tool that is used +for development, testing, or building the documentation and this document describes how to do it. + +When it comes to defining dependencies for any of the Apache Airflow distributions, we are following the +standard ``pyproject.toml`` format that has been defined in `PEP 518 <https://peps.python.org/pep-0518/>`_ +and `PEP 621 <https://peps.python.org/pep-0621/>`_, we are also using dependency groups defined in +`PEP 735 <https://peps.python.org/pep-0735/>`_ - particularly ``dev`` dependency group which we use in +all our ``pyproject.toml`` files to define development dependencies. + +We have more than 100 python distributions in Apache Airflow repository - including the main +``apache-airflow`` package, ``apache-airflow-providers-*`` packages, ``apache-airflow-core`` package, +``apache-airflow-task-sdk`` package, ``apache-airflow-ctl`` package, and several other packages. + +They are all connected together via ``uv`` workspace feature (workspace is defined in the root ``pyproject.toml`` +file of the repository in the ``apache-airflow`` distribution definition. The workspace feature allows us to +run ``uv sync`` at the top of the repository to install all packages in editable mode in the development +environment from all the 100+ distributions (!) and resolve the dependencies together, so that we know that +the dependencies have no conflicting requirements. Also the distributions are referring to each other via +name - which means that when you run locally ``uv sync``, the local version of the packages are used, not the +ones released on PyPI, which means that you can develop and test changes that span multiple packages at the +same time. This is a very powerful feature that allows us to maintain the complex ecosystem of Apache Airflow +distributions in a single monorepo, and allows us - for example to add new feature to common distributions used +by multiple providers and test them all together before releasing new versions of either of those pacckages + +Managing dependencies in ``pyproject.toml`` files +................................................. + +Each of the ``pyproject.toml`` files in Apache Airflow repository defines dependencies in one of the +following sections: + +* ``[project.dependencies]`` - this section defines the required dependencies of the package. These + dependencies are installed when you install the package without any extras. +* ``[project.optional-dependencies]`` - this section defines optional dependencies (extras) of the package. + These dependencies are installed when you install the package with extras - for example + ``pip install apache-airflow[ssh]`` will install the ``ssh`` extra dependencies defined in this section. +* ``[dependency-group.dev]`` - this section defines development dependencies of the package. + These dependencies are installed when you run ``uv sync`` by default. when ``uv`` syncs sources with + local pyproject.toml it adds ``dev`` dependency group and package is installed in editable mode with + development dependencies. + + +Adding and modifying dependencies +................................. + +Adding and modifying dependencies in Apache Airflow is done by modifying the appropriate +``pyproject.toml`` file in the appropriate distribution. + +When you add a new dependency, you should make sure that: + +* The dependency is added to the right section (main dependencies, optional dependencies, or + development dependencies) + +* Some parts of those dependencies might be automatically generated (and overwritten) by our ``prek`` + hooks. Those are the necessary dependencies that ``prek`` hoks can figure out automatically by + analyzing the imports in the sources and structure of the project. We also have special case of + shared dependencies (described in `shared dependencies document <../shared/README.md>`__) where we + effectively "static-link" some libraries into multiple distributions, to avoid unnecessary coupling and + circular dependencies, and those dependencies contribute some dependencies automatically as well. + Pay attention to comments such us example start and end of such generated block below. + All the dependencies between those comments are automatically generated and you should not modify + them manually. Tou might instead modify the root source of those dependencies - depending on the automation, + you can usually also add extra dependencies manually outside of those comment blocks + + .. code-block:: python + + # Automatically generated airflow optional dependencies (update_airflow_pyproject_toml.py) + # .... + # LOTS OF GENERATED DEPENDENCIES HERE + # .... + # End of automatically generated airflow optional dependencies + +* The version specifier is as open as possible (upper) while still allowing the package to install + and pass all tests. We very rarely upper-bind dependencies - only when there is a known + conflict with a new or upcoming version of a dependency that breaks the installation or tests + (and we always make a comment why we are upper-bounding a dependency). + +* Make sure to lower-bind any dependency you add. Usually we lower-bind dependencies to the + minimum version that is required for the package to work but in order to simplify the work of + resolvers such as ``pip``, we often lower-bind to higher (and newer) version than the absolute minimum + especially when the minimum version is very old. This is for example good practice in ``boto`` and related + packages, where new version of those packages are released frequently (almost daily) and there are many + versions that need to be considered by the resolver if the version is not new enough. + +* Make sure to run ``uv sync`` after modifying dependencies to make sure that there are no + conflicts between dependencies of different packages in the workspace. You can run it in multiple + ways - either from the root of the repository (which will sync all packages) or from the package + you modified (which will sync only that package and its dependencies). Also good idea might be to + run ``uv sync --all-packages --all-extras`` at the root of the repository to make sure that + all packages with all extras can be installed together without conflicts, but this might be sometimes + difficult and slow as some of the extras require some additional system level dependencies to be installed + (for example ``mysql`` or ``postgres`` extras require client libraries to be installed on the system). + +* Make sure to run all tests after modifying dependencies to make sure that nothing is broken + + +Referring to other Apache Airflow distributions in dependencies +............................................................... + +With having more than 100 distributions in the repository, it is often necessary to refer to +other distributions in the dependencies in order to use some common features or simply to use the +features that the other distribution provides. There are two ways of doing it: Review Comment: This is the third time we mention how many dists we have. ```suggestion It is often necessary to refer to other distributions in the dependencies. There are two ways of doing it: ``` ########## dev/breeze/tests/test_selective_checks.py: ########## @@ -2779,3 +2779,169 @@ def test_testable_providers_integrations_excludes_arm_disabled_on_arm(): assert "postgres" in result assert "trino" not in result assert "ydb" not in result + + +@patch("airflow_breeze.utils.selective_checks.run_command") +def test_provider_dependency_bump_check_no_changes(mock_run_command): + """Test that provider dependency bump check passes when no pyproject.toml files are changed.""" + selective_checks = SelectiveChecks( + files=("some_other_file.py",), + commit_ref=NEUTRAL_COMMIT, + pr_labels=(), + github_event=GithubEvents.PULL_REQUEST, + default_branch="main", + ) + result = selective_checks.provider_dependency_bump + assert result is False + + +@patch("airflow_breeze.utils.selective_checks.run_command") +def test_provider_dependency_bump_check_fails_on_provider_version_bump(mock_run_command): Review Comment: Is this a common enough occurance that we need to bake it in to our CI? ########## contributing-docs/13_airflow_dependencies_and_extras.rst: ########## @@ -18,6 +18,226 @@ Airflow dependencies ==================== +This document describes how we manage Apache Airflow dependencies, as well as how we make sure +that users can use Airflow both as an application and as a library when they are deploying +their own Airflow - using our constraints mechanism. + +Airflow ``pyproject.toml`` files and ``uv`` workspace +..................................................... + +Managing dependencies is an important part of maintaining Apache Airflow, we have more than 700 +dependencies, and often when you add a new feature that requires a new dependency, you should +also update the dependency list. This also happens when you want to add a new tool that is used +for development, testing, or building the documentation and this document describes how to do it. + +When it comes to defining dependencies for any of the Apache Airflow distributions, we are following the +standard ``pyproject.toml`` format that has been defined in `PEP 518 <https://peps.python.org/pep-0518/>`_ +and `PEP 621 <https://peps.python.org/pep-0621/>`_, we are also using dependency groups defined in +`PEP 735 <https://peps.python.org/pep-0735/>`_ - particularly ``dev`` dependency group which we use in +all our ``pyproject.toml`` files to define development dependencies. + +We have more than 100 python distributions in Apache Airflow repository - including the main +``apache-airflow`` package, ``apache-airflow-providers-*`` packages, ``apache-airflow-core`` package, +``apache-airflow-task-sdk`` package, ``apache-airflow-ctl`` package, and several other packages. + +They are all connected together via ``uv`` workspace feature (workspace is defined in the root ``pyproject.toml`` +file of the repository in the ``apache-airflow`` distribution definition. The workspace feature allows us to +run ``uv sync`` at the top of the repository to install all packages in editable mode in the development +environment from all the 100+ distributions (!) and resolve the dependencies together, so that we know that +the dependencies have no conflicting requirements. Also the distributions are referring to each other via +name - which means that when you run locally ``uv sync``, the local version of the packages are used, not the +ones released on PyPI, which means that you can develop and test changes that span multiple packages at the +same time. This is a very powerful feature that allows us to maintain the complex ecosystem of Apache Airflow +distributions in a single monorepo, and allows us - for example to add new feature to common distributions used +by multiple providers and test them all together before releasing new versions of either of those pacckages + +Managing dependencies in ``pyproject.toml`` files +................................................. + +Each of the ``pyproject.toml`` files in Apache Airflow repository defines dependencies in one of the +following sections: + +* ``[project.dependencies]`` - this section defines the required dependencies of the package. These + dependencies are installed when you install the package without any extras. +* ``[project.optional-dependencies]`` - this section defines optional dependencies (extras) of the package. + These dependencies are installed when you install the package with extras - for example + ``pip install apache-airflow[ssh]`` will install the ``ssh`` extra dependencies defined in this section. +* ``[dependency-group.dev]`` - this section defines development dependencies of the package. + These dependencies are installed when you run ``uv sync`` by default. when ``uv`` syncs sources with + local pyproject.toml it adds ``dev`` dependency group and package is installed in editable mode with + development dependencies. + + +Adding and modifying dependencies +................................. + +Adding and modifying dependencies in Apache Airflow is done by modifying the appropriate +``pyproject.toml`` file in the appropriate distribution. + +When you add a new dependency, you should make sure that: + +* The dependency is added to the right section (main dependencies, optional dependencies, or + development dependencies) + +* Some parts of those dependencies might be automatically generated (and overwritten) by our ``prek`` + hooks. Those are the necessary dependencies that ``prek`` hoks can figure out automatically by + analyzing the imports in the sources and structure of the project. We also have special case of + shared dependencies (described in `shared dependencies document <../shared/README.md>`__) where we + effectively "static-link" some libraries into multiple distributions, to avoid unnecessary coupling and + circular dependencies, and those dependencies contribute some dependencies automatically as well. + Pay attention to comments such us example start and end of such generated block below. + All the dependencies between those comments are automatically generated and you should not modify + them manually. Tou might instead modify the root source of those dependencies - depending on the automation, + you can usually also add extra dependencies manually outside of those comment blocks + + .. code-block:: python + + # Automatically generated airflow optional dependencies (update_airflow_pyproject_toml.py) + # .... + # LOTS OF GENERATED DEPENDENCIES HERE + # .... + # End of automatically generated airflow optional dependencies + +* The version specifier is as open as possible (upper) while still allowing the package to install + and pass all tests. We very rarely upper-bind dependencies - only when there is a known + conflict with a new or upcoming version of a dependency that breaks the installation or tests + (and we always make a comment why we are upper-bounding a dependency). + +* Make sure to lower-bind any dependency you add. Usually we lower-bind dependencies to the + minimum version that is required for the package to work but in order to simplify the work of + resolvers such as ``pip``, we often lower-bind to higher (and newer) version than the absolute minimum + especially when the minimum version is very old. This is for example good practice in ``boto`` and related + packages, where new version of those packages are released frequently (almost daily) and there are many + versions that need to be considered by the resolver if the version is not new enough. + +* Make sure to run ``uv sync`` after modifying dependencies to make sure that there are no + conflicts between dependencies of different packages in the workspace. You can run it in multiple + ways - either from the root of the repository (which will sync all packages) or from the package + you modified (which will sync only that package and its dependencies). Also good idea might be to + run ``uv sync --all-packages --all-extras`` at the root of the repository to make sure that + all packages with all extras can be installed together without conflicts, but this might be sometimes + difficult and slow as some of the extras require some additional system level dependencies to be installed + (for example ``mysql`` or ``postgres`` extras require client libraries to be installed on the system). + +* Make sure to run all tests after modifying dependencies to make sure that nothing is broken + + +Referring to other Apache Airflow distributions in dependencies +............................................................... + +With having more than 100 distributions in the repository, it is often necessary to refer to +other distributions in the dependencies in order to use some common features or simply to use the +features that the other distribution provides. There are two ways of doing it: + +* Regular package linking with ``apache-airflow-*`` dependency +* Airflow "shared dependencies" mechanism - which is a bit of a custom hack for Airflow monorepo + that allows us to "static-link" some common dependencies into multiple distributions without + creating circular dependencies. + +We are not going to describe the shared dependencies mechanism here, please refer to the +`shared dependencies document <../shared/README.md>`__ for details, but there are certain rules +when it comes to referring to other Airflow distributions in dependencies - here are the important +rules to remember: + +* You can refer to other distributions in your dependencies - as usual using distribution name. For example, + if you are adding a dependency to ``apache-airflow-providers-common-compat`` package from + ``apache-airflow-providers-google``, you can just add ``apache-airflow-providers-common>=x.y.z`` to the + dependencies and when you run ``uv sync``, the local version of the package will be used automatically + (this is thanks to the workspace feature of ``uv`` that does great job of binding our monorepo together). + This is very useful when you are developing changes to multiple packages at the same time. Some of those + are added automatically by prek hooks - when it can detect such dependencies by analyzing imports in the + sources - then they are added automatically between the special comments mentioned above, but sometimes + (especially when such dependencies are not at the top-level imports) you might need to add them manually. + +* Never update distribution versions on your own. It is **entirely** up to the Release Manager to bump the + version of distributions that are defined as ``project.version``. This goes almost without an exception + and any diversions from this rule should be discussed at ``#release-management`` channel in Airflow Slack + beforehand. The only exception to this rule is when you are adding a new distribution to the repository - + in that case you should set the initial version of the distribution - usually ``0.0.1``, ``0.1.0`` + or ``1.0.0`` depending on the maturity of the package. But still you should discuss it in the channel. + +* Sometimes, when you add a new feature to a common distribution, you might add a feature to it or + change the API in the way that other packages can use it. This is especially true for common packages such as + ``apache-airflow-providers-common-compat``, but can happen for other packages (for example + ``apache-airflow-providers-apache-beam`` is used by ``apache-airflow-providers-google`` to use ``Apache Beam`` + hooks to communicate with Google Dataflow). In such case, when you are adding a feature to a common package + remember that the feature you just add. will only be released in the **FUTURE** release of such common Review Comment: ```suggestion remember that the feature you just add will only be released in the **FUTURE** release of such common ``` ########## contributing-docs/13_airflow_dependencies_and_extras.rst: ########## @@ -18,6 +18,226 @@ Airflow dependencies ==================== +This document describes how we manage Apache Airflow dependencies, as well as how we make sure +that users can use Airflow both as an application and as a library when they are deploying +their own Airflow - using our constraints mechanism. + +Airflow ``pyproject.toml`` files and ``uv`` workspace +..................................................... + +Managing dependencies is an important part of maintaining Apache Airflow, we have more than 700 +dependencies, and often when you add a new feature that requires a new dependency, you should +also update the dependency list. This also happens when you want to add a new tool that is used +for development, testing, or building the documentation and this document describes how to do it. + +When it comes to defining dependencies for any of the Apache Airflow distributions, we are following the +standard ``pyproject.toml`` format that has been defined in `PEP 518 <https://peps.python.org/pep-0518/>`_ +and `PEP 621 <https://peps.python.org/pep-0621/>`_, we are also using dependency groups defined in +`PEP 735 <https://peps.python.org/pep-0735/>`_ - particularly ``dev`` dependency group which we use in +all our ``pyproject.toml`` files to define development dependencies. + +We have more than 100 python distributions in Apache Airflow repository - including the main +``apache-airflow`` package, ``apache-airflow-providers-*`` packages, ``apache-airflow-core`` package, +``apache-airflow-task-sdk`` package, ``apache-airflow-ctl`` package, and several other packages. + +They are all connected together via ``uv`` workspace feature (workspace is defined in the root ``pyproject.toml`` +file of the repository in the ``apache-airflow`` distribution definition. The workspace feature allows us to +run ``uv sync`` at the top of the repository to install all packages in editable mode in the development +environment from all the 100+ distributions (!) and resolve the dependencies together, so that we know that +the dependencies have no conflicting requirements. Also the distributions are referring to each other via +name - which means that when you run locally ``uv sync``, the local version of the packages are used, not the +ones released on PyPI, which means that you can develop and test changes that span multiple packages at the +same time. This is a very powerful feature that allows us to maintain the complex ecosystem of Apache Airflow +distributions in a single monorepo, and allows us - for example to add new feature to common distributions used +by multiple providers and test them all together before releasing new versions of either of those pacckages + +Managing dependencies in ``pyproject.toml`` files +................................................. + +Each of the ``pyproject.toml`` files in Apache Airflow repository defines dependencies in one of the +following sections: + +* ``[project.dependencies]`` - this section defines the required dependencies of the package. These + dependencies are installed when you install the package without any extras. +* ``[project.optional-dependencies]`` - this section defines optional dependencies (extras) of the package. + These dependencies are installed when you install the package with extras - for example + ``pip install apache-airflow[ssh]`` will install the ``ssh`` extra dependencies defined in this section. +* ``[dependency-group.dev]`` - this section defines development dependencies of the package. + These dependencies are installed when you run ``uv sync`` by default. when ``uv`` syncs sources with + local pyproject.toml it adds ``dev`` dependency group and package is installed in editable mode with + development dependencies. + + +Adding and modifying dependencies +................................. + +Adding and modifying dependencies in Apache Airflow is done by modifying the appropriate +``pyproject.toml`` file in the appropriate distribution. + +When you add a new dependency, you should make sure that: + +* The dependency is added to the right section (main dependencies, optional dependencies, or + development dependencies) + +* Some parts of those dependencies might be automatically generated (and overwritten) by our ``prek`` + hooks. Those are the necessary dependencies that ``prek`` hoks can figure out automatically by + analyzing the imports in the sources and structure of the project. We also have special case of + shared dependencies (described in `shared dependencies document <../shared/README.md>`__) where we + effectively "static-link" some libraries into multiple distributions, to avoid unnecessary coupling and + circular dependencies, and those dependencies contribute some dependencies automatically as well. + Pay attention to comments such us example start and end of such generated block below. + All the dependencies between those comments are automatically generated and you should not modify + them manually. Tou might instead modify the root source of those dependencies - depending on the automation, + you can usually also add extra dependencies manually outside of those comment blocks + + .. code-block:: python + + # Automatically generated airflow optional dependencies (update_airflow_pyproject_toml.py) + # .... + # LOTS OF GENERATED DEPENDENCIES HERE + # .... + # End of automatically generated airflow optional dependencies + +* The version specifier is as open as possible (upper) while still allowing the package to install + and pass all tests. We very rarely upper-bind dependencies - only when there is a known + conflict with a new or upcoming version of a dependency that breaks the installation or tests + (and we always make a comment why we are upper-bounding a dependency). + +* Make sure to lower-bind any dependency you add. Usually we lower-bind dependencies to the + minimum version that is required for the package to work but in order to simplify the work of + resolvers such as ``pip``, we often lower-bind to higher (and newer) version than the absolute minimum + especially when the minimum version is very old. This is for example good practice in ``boto`` and related + packages, where new version of those packages are released frequently (almost daily) and there are many + versions that need to be considered by the resolver if the version is not new enough. + +* Make sure to run ``uv sync`` after modifying dependencies to make sure that there are no + conflicts between dependencies of different packages in the workspace. You can run it in multiple + ways - either from the root of the repository (which will sync all packages) or from the package + you modified (which will sync only that package and its dependencies). Also good idea might be to + run ``uv sync --all-packages --all-extras`` at the root of the repository to make sure that + all packages with all extras can be installed together without conflicts, but this might be sometimes + difficult and slow as some of the extras require some additional system level dependencies to be installed + (for example ``mysql`` or ``postgres`` extras require client libraries to be installed on the system). + +* Make sure to run all tests after modifying dependencies to make sure that nothing is broken + + +Referring to other Apache Airflow distributions in dependencies +............................................................... + +With having more than 100 distributions in the repository, it is often necessary to refer to +other distributions in the dependencies in order to use some common features or simply to use the +features that the other distribution provides. There are two ways of doing it: + +* Regular package linking with ``apache-airflow-*`` dependency +* Airflow "shared dependencies" mechanism - which is a bit of a custom hack for Airflow monorepo + that allows us to "static-link" some common dependencies into multiple distributions without + creating circular dependencies. + +We are not going to describe the shared dependencies mechanism here, please refer to the +`shared dependencies document <../shared/README.md>`__ for details, but there are certain rules +when it comes to referring to other Airflow distributions in dependencies - here are the important +rules to remember: + +* You can refer to other distributions in your dependencies - as usual using distribution name. For example, + if you are adding a dependency to ``apache-airflow-providers-common-compat`` package from + ``apache-airflow-providers-google``, you can just add ``apache-airflow-providers-common>=x.y.z`` to the + dependencies and when you run ``uv sync``, the local version of the package will be used automatically + (this is thanks to the workspace feature of ``uv`` that does great job of binding our monorepo together). + This is very useful when you are developing changes to multiple packages at the same time. Some of those + are added automatically by prek hooks - when it can detect such dependencies by analyzing imports in the + sources - then they are added automatically between the special comments mentioned above, but sometimes + (especially when such dependencies are not at the top-level imports) you might need to add them manually. + +* Never update distribution versions on your own. It is **entirely** up to the Release Manager to bump the + version of distributions that are defined as ``project.version``. This goes almost without an exception + and any diversions from this rule should be discussed at ``#release-management`` channel in Airflow Slack + beforehand. The only exception to this rule is when you are adding a new distribution to the repository - + in that case you should set the initial version of the distribution - usually ``0.0.1``, ``0.1.0`` + or ``1.0.0`` depending on the maturity of the package. But still you should discuss it in the channel. + Review Comment: This feels out of place in a document talking about dependencies. ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
