Oh. . And one other benefit of it will be that we will be able to get rid of about 40% of the "Providers Manager" code. Currently, in Providers manager we have a lot of "ifs" that make it possible to use providers in breeze and local environment from the sources. In "production" installation we are using "get_provider_info" entry points to discover providers and discover if provider is installed, but when you use current providers installed in Breeze to inside "airflow", we rely on `provider.yaml` to be present in the "airflow.providers.PROVIDER_ID" path - so we effectively have two paths of discovering information about the providers installed.
After all providers are migrated to the new structure, all providers are separate "distributions" - and when you run `uv sync` (which will install all providers thanks to workspace feature) or `pip install -e ./providers/aws` (which you will have to do manually to work on the provider - if you use `pip` rather than uv) - then we will not have to use the separate path to read provider.yaml, because the right entrypoint for the provider will be installed as well - so we will be able to get rid of quite some code that is currently only used in airflow development environment. J. On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Those are very good questions :) > > On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis > <ferru...@amazon.com.invalid> wrote: > >> To clarify that I understand your diagram correctly, let's say you clone >> the Airflow repo to ~/workspace/airflow/. Does this mean that the AWS Glue >> Hook which used to live at >> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a random >> example) will be located at >> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? >> That feels unnecessarily repetitive to me, maybe it makes sense but I'm >> missing the context? >> > > Yes - it means that there is this repetitiveness but for a good reason - > those two "amazon/aws" serve different purpose: > > * The first "providers/amazon/aws" is just where the whole provider > "complete project" is stored - it's basically a directory where "aws > provider" is stored, a convenient folder to locate it in, that makes it > separate from other providers > * The second "src/airflow/providers/amazon/aws" - is the python > package where the source files is stored - this is how (inside the > sub-folder) you tell the actual python "import" to look for the sources. > > .What really matters is that (eventually) > `~/workspace/airflow/providers/amazon/aws/` can be treated as a completely > separate python project - a source of a "standalone" provider python > project. > There is a "pyproject.toml" file at the root of it and if you do this (for > example): > > cd providers/amazon/aws/ > uv sync > > And with it you will be able to work on AWS provider exclusively as a > separate project (this is not yet complete with the move - tests are not > entirely possible to run today - but it will be possible as next step - I > explained it in > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > This has a number of benefits, but one of them is that you will be able to > build provider by just running `build` command of your favourite > PEP-standard compliant frontend: > > cd providers/amazon/aws/ > `uv build` (or `hatch build` or `poetry build` or `flit build` ).... > > This will create the provider package inside the `dist" folder. I just > did it in my PR with `uv` in the first "airbyte` project: > > root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build > Building source distribution... > Building wheel from source distribution... > Successfully built dist/apache_airflow_providers_airbyte-5.0.0.tar.gz > Successfully built > dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl > > That's it. That also allows cases like installing provider packages using > git URLs - which I used earlier today to test if the incoming PR of > pygments is actually solving the problem we had yesteday > https://github.com/apache/airflow/pull/45416 (basically we just make our > provider packages "standard" python packages that all the tools support. > Anyone who would like to install a commit, hash or branch version of the > "airbyte" package from main version of Airflow repo will be able to do: > > pip install "apache-airflow-providers-airbyte @ git+ > https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID" > > Currently in order to create the package we need to manually extract the > "amazon" subtree, copy it elsewhere, prepare dynamically some files > (pyproject.toml, README.rst and few others) and only then we build the > package. All this - copying file structure, creating new files, running the > build command after and finally deleting the copied files is now - > dynamically and under-the-hood created by "breeze release-management > prepare-provider-packages" command. With this change, the directory > structure in `git` repo of ours is totally standard and allows us (and > anyone else) to build the package directly from it. > > > And what is the plan for system tests? As part of this reorganization, >> could they be moved into providers/{PROVIDER_ID}/tests/system? That seems >> more intuitive to me than their current location in >> providers/tests/system/{PROVIDER_ID}/example_foo.py. >> >> > Oh yeah - I missed that in the original structure as the "airbyte" > provider (that I chose as first one) did not contain the "system" tests - > but one of the two providers after that i was planning to make sure system > tests are covered. They are supposed to be moved to "tests/system" of > course - Elad had similar question and I also explained it in detail in > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > > I hope it answers the questions. If not - I am happy to add more > clarifications :) > > >> J. >> >