To clarify that I understand your diagram correctly, let's say you clone the Airflow repo to ~/workspace/airflow/. Does this mean that the AWS Glue Hook which used to live at ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a random example) will be located at ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? That feels unnecessarily repetitive to me, maybe it makes sense but I'm missing the context?
And what is the plan for system tests? As part of this reorganization, could they be moved into providers/{PROVIDER_ID}/tests/system? That seems more intuitive to me than their current location in providers/tests/system/{PROVIDER_ID}/example_foo.py. Proposed location: providers |- PROVIDER_ID | |- src | | |-airflow | | |- providers | | |- PROVIDER_ID | |- tests | | |- providers | | |- PROVIDER_ID | | |- hooks | | |- operators | | |- sensors | | |- system <<< HERE >>> | | |- .... | |- docs | | |- .latest-doc-only-changes.txt | |- pyproject.toml | |- CHANGELOG.rst | |- provider.yaml | |- README.rst |- PROVIDER_ID2 ... - ferruzzi ________________________________ From: Jarek Potiuk <ja...@potiuk.com> Sent: Sunday, January 5, 2025 4:40 PM To: dev@airflow.apache.org Subject: [EXT] [ANNOUNCEMENT] Moving Providers to separate sub-projects soon-ish CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le contenu ne présente aucun risque. Hello everyone, I have almost ready PR (I need to fix doc building and do some more tests / docs for local development environments), for the first step of provider separation to separate sub-projects inside our mono-repo - which has been discussed and agreed to a long time ago - and I had a POC for it some years ago, but only recently with `uv workspaces` we could progress with it (finally the tooling caught up with what Airflow needs). The `uv workspaces` were implemented by the Astral team after discussing with us how they should be implemented so that they could be used in Airflow. PR here: https://github.com/apache/airflow/pull/45259 I have just one provider that is moved" "airbyte" in this PR. The way it is implemented is that breeze and CI supports both "old style" and "new style" providers at the same time. I want to move two/three more providers that are a little more complex - but once completed we should be able to relatively quickly move all providers one-by-on (I would love the usual involvement of others - I will create a script to mostly move things automatically, but there will likely be some small things to fix in each provider, so better to do it one-by-one, to solve smaller number of problems at a time. During the move, all regular processes (including all CI builds and releasing packages) should work as "usual" - relevant breeze commands are converted to support both cases automatically. Once completed -> we should be able to remove the complex-ish code for "old-style" providers. And there are a few next steps - rearrangements on how we use workspaces for `tests_common` and final split to mutliple packages for airflow core (and likely moving airflow code to "src" subdirectory of the project) - but those should be done later once we agree how exactly we should split the packages. Currently you can't yet run provider tests in the provider "standalone" because of tests_comon dependence for example (they have to be run as part of the "airflow" project). We might also reduce further data kept in provider.yaml (in this PR dependencies are moved to pyproject.toml of the provider from provider.yaml). There are few more cleanups there - but it's best to move the providers first and then do the next steps. The issue in DEV/CI project for that one https://github.com/apache/airflow/issues/44511 Here is - in general - the new provider directory structure: in effect, each provider is a separate "standard" python project, so we will not have to copy files around when building directories - each provider will be just another provider package. providers |- PROVIDER_ID | |- src | | |-airflow | | |- providers | | |- PROVIDER_ID | |- tests | | |- providers | | |- PROVIDER_ID | |- docs | | |- .latest-doc-only-changes.txt | |- pyproject.toml | |- CHANGELOG.rst | |- provider.yaml | |- README.rst |- PROVIDER_ID2 ... Looking forward to reviews and merging it soon. J.