+1 one to this on general terms, it will hopefully reduce a lot of the 
boilerplate we need.

As for the amazon/aws example specifically that does bring up a question, 
should we have `/` or `-`.. to give some examples:

cncf kubernetes: ./providers/cncf/kubernetes or ./providers/cncf-kubernetes
Apache hive: ./providers/apache/hive or ./providers/apache-hive
AWS: ./providers/amazon/aws or ./providers/amazon-aws

There is no requirement from python etc on one form or the other (as it’s just 
a folder, not part of the module name), so it’s what ever makes most sense to 
us.

Jarek and Dennis (and others): what are your preferences on these styles?

-ash

> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> Oh. . And one other benefit of it will be that we will be able to get rid
> of about 40% of the "Providers Manager" code. Currently, in Providers
> manager we have a lot of "ifs" that make it possible to use providers in
> breeze and local environment from the sources. In "production" installation
> we are using "get_provider_info"  entry points to discover providers and
> discover if provider is installed, but when you use current providers
> installed in Breeze to inside "airflow", we rely on `provider.yaml` to be
> present in the "airflow.providers.PROVIDER_ID" path - so we effectively
> have two paths of discovering information about the providers installed.
> 
> After all providers are migrated to the new structure, all providers are
> separate "distributions" - and when you run `uv sync`  (which will install
> all providers thanks to workspace feature) or `pip install -e
> ./providers/aws` (which you will have to do manually to work on the
> provider - if you use `pip` rather than uv) - then we will not have to use
> the separate path to read provider.yaml, because the right entrypoint for
> the provider will be installed as well - so we will be able to get rid of
> quite some code that is currently only used in airflow development
> environment.
> 
> J.
> 
> 
> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> 
>> Those are very good questions :)
>> 
>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
>> <ferru...@amazon.com.invalid> wrote:
>> 
>>> To clarify that I understand your diagram correctly, let's say you clone
>>> the Airflow repo to ~/workspace/airflow/.  Does this mean that the AWS Glue
>>> Hook which used to live at
>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a random
>>> example) will be located at
>>> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
>>> That feels unnecessarily repetitive to me, maybe it makes sense but I'm
>>> missing the context?
>>> 
>> 
>> Yes - it means that there is this repetitiveness but for a good reason -
>> those two "amazon/aws" serve different purpose:
>> 
>> * The first "providers/amazon/aws" is just where the whole provider
>> "complete project" is stored - it's basically a directory where "aws
>> provider" is stored, a convenient folder to locate it in, that makes it
>> separate from other providers
>> * The second "src/airflow/providers/amazon/aws" - is the python
>> package where the source files is stored - this is how (inside the
>> sub-folder) you tell the actual python "import" to look for the sources.
>> 
>> .What really matters is that (eventually)
>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a completely
>> separate python project - a source of a "standalone" provider python
>> project.
>> There is a "pyproject.toml" file at the root of it and if you do this (for
>> example):
>> 
>> cd providers/amazon/aws/
>> uv sync
>> 
>> And with it you will be able to work on AWS provider exclusively as a
>> separate project (this is not yet complete with the move - tests are not
>> entirely possible to run today - but it will be possible as next step - I
>> explained it in
>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
>> 
>> This has a number of benefits, but one of them is that you will be able to
>> build provider by just running `build` command of your favourite
>> PEP-standard compliant frontend:
>> 
>> cd providers/amazon/aws/
>> `uv build` (or `hatch build` or `poetry build` or `flit build` )....
>> 
>> This will create  the provider package inside the `dist" folder. I just
>> did it in my PR with `uv` in the first "airbyte` project:
>> 
>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build
>> Building source distribution...
>> Building wheel from source distribution...
>> Successfully built dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
>> Successfully built
>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
>> 
>> That's it. That also allows cases like installing provider packages using
>> git URLs - which I used earlier today to test if the incoming PR of
>> pygments is actually solving the problem we had yesteday
>> https://github.com/apache/airflow/pull/45416  (basically we just make our
>> provider packages "standard" python packages that all the tools support.
>> Anyone who would like to install a commit, hash or branch version of the
>> "airbyte" package from main version of Airflow repo will be able to do:
>> 
>> pip install "apache-airflow-providers-airbyte @ git+
>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
>> 
>> Currently in order to create the package we need to manually extract the
>> "amazon" subtree, copy it elsewhere, prepare dynamically some files
>> (pyproject.toml, README.rst and few others) and only then we  build the
>> package. All this - copying file structure, creating new files, running the
>> build command after and finally deleting the copied files is now -
>> dynamically and under-the-hood created by "breeze release-management
>> prepare-provider-packages" command. With this change, the directory
>> structure in `git` repo of ours is totally standard and allows us (and
>> anyone else) to build the package directly from it.
>> 
>> 
>> And what is the plan for system tests?   As part of this reorganization,
>>> could they be moved into providers/{PROVIDER_ID}/tests/system?  That seems
>>> more intuitive to me than their current location in
>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
>>> 
>>> 
>> Oh yeah - I missed that in the original structure as the "airbyte"
>> provider (that I chose as first one) did not contain the "system" tests -
>> but one of the two providers after that i was planning to make sure system
>> tests are covered. They are supposed to be moved to "tests/system" of
>> course - Elad had similar question and I also explained it in detail in
>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
>> 
>> 
>> I hope it answers the questions. If not - I am happy to add more
>> clarifications :)
>> 
>> 
>>> J.
>>> 
>> 

Reply via email to