I think it will be better to keep it. The reason we have varying levels were to group things together - mainly Apache related providers, but also Microsoft.
And we already have a number of mappings and conventions to handle that. For example provider I'd mapping to dirs (apache.beam -> apache/beam), and 'apache-airflow-providers-apache-beam' as package na e and airflow/providers/apache/beam as packages inside the distribution. Those will remain as they are - we cannot change them without breaking compatibility. So if we change it to a flat structure we will have some inconsistencies - in some cases it will be single folder in others (packages) those will be two folders. I think it will be more harm than good if we get rid of the 'folder' structures - some of the code in breeze will have to treat those differently as well. Nothing extraordinary and very complex but more complex-ish than it should be - already on top of handling potentially nested folders So my preference would be to stay with apache/beam - it's just more consistently handling the case where provider packages can be one-level nested J wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org> napisał: > Good question. I always found it confusing to have some providers at > different level. Examples: > - "airbyte" in "providers" directory (I would qualify it as "regular" > provider) > - "hive" in "providers/apache" > - "amazon" in "providers" but which contains only one sub directory "aws" > > I would be in favor of using "-" instead of "/" so that all providers are > at the same level. > > > On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote: > > +1 one to this on general terms, it will hopefully reduce a lot of the > boilerplate we need. > > > > As for the amazon/aws example specifically that does bring up a > question, should we have `/` or `-`.. to give some examples: > > > > cncf kubernetes: ./providers/cncf/kubernetes or > ./providers/cncf-kubernetes > > Apache hive: ./providers/apache/hive or ./providers/apache-hive > > AWS: ./providers/amazon/aws or ./providers/amazon-aws > > > > There is no requirement from python etc on one form or the other (as > it’s just a folder, not part of the module name), so it’s what ever makes > most sense to us. > > > > Jarek and Dennis (and others): what are your preferences on these styles? > > > > -ash > > > > > On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > Oh. . And one other benefit of it will be that we will be able to get > rid > > > of about 40% of the "Providers Manager" code. Currently, in Providers > > > manager we have a lot of "ifs" that make it possible to use providers > in > > > breeze and local environment from the sources. In "production" > installation > > > we are using "get_provider_info" entry points to discover providers > and > > > discover if provider is installed, but when you use current providers > > > installed in Breeze to inside "airflow", we rely on `provider.yaml` to > be > > > present in the "airflow.providers.PROVIDER_ID" path - so we effectively > > > have two paths of discovering information about the providers > installed. > > > > > > After all providers are migrated to the new structure, all providers > are > > > separate "distributions" - and when you run `uv sync` (which will > install > > > all providers thanks to workspace feature) or `pip install -e > > > ./providers/aws` (which you will have to do manually to work on the > > > provider - if you use `pip` rather than uv) - then we will not have to > use > > > the separate path to read provider.yaml, because the right entrypoint > for > > > the provider will be installed as well - so we will be able to get rid > of > > > quite some code that is currently only used in airflow development > > > environment. > > > > > > J. > > > > > > > > > On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > >> Those are very good questions :) > > >> > > >> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis > > >> <ferru...@amazon.com.invalid> wrote: > > >> > > >>> To clarify that I understand your diagram correctly, let's say you > clone > > >>> the Airflow repo to ~/workspace/airflow/. Does this mean that the > AWS Glue > > >>> Hook which used to live at > > >>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a random > > >>> example) will be located at > > >>> > ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? > > >>> That feels unnecessarily repetitive to me, maybe it makes sense but > I'm > > >>> missing the context? > > >>> > > >> > > >> Yes - it means that there is this repetitiveness but for a good > reason - > > >> those two "amazon/aws" serve different purpose: > > >> > > >> * The first "providers/amazon/aws" is just where the whole provider > > >> "complete project" is stored - it's basically a directory where "aws > > >> provider" is stored, a convenient folder to locate it in, that makes > it > > >> separate from other providers > > >> * The second "src/airflow/providers/amazon/aws" - is the python > > >> package where the source files is stored - this is how (inside the > > >> sub-folder) you tell the actual python "import" to look for the > sources. > > >> > > >> .What really matters is that (eventually) > > >> `~/workspace/airflow/providers/amazon/aws/` can be treated as a > completely > > >> separate python project - a source of a "standalone" provider python > > >> project. > > >> There is a "pyproject.toml" file at the root of it and if you do this > (for > > >> example): > > >> > > >> cd providers/amazon/aws/ > > >> uv sync > > >> > > >> And with it you will be able to work on AWS provider exclusively as a > > >> separate project (this is not yet complete with the move - tests are > not > > >> entirely possible to run today - but it will be possible as next step > - I > > >> explained it in > > >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > >> > > >> This has a number of benefits, but one of them is that you will be > able to > > >> build provider by just running `build` command of your favourite > > >> PEP-standard compliant frontend: > > >> > > >> cd providers/amazon/aws/ > > >> `uv build` (or `hatch build` or `poetry build` or `flit build` ).... > > >> > > >> This will create the provider package inside the `dist" folder. I > just > > >> did it in my PR with `uv` in the first "airbyte` project: > > >> > > >> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build > > >> Building source distribution... > > >> Building wheel from source distribution... > > >> Successfully built dist/apache_airflow_providers_airbyte-5.0.0.tar.gz > > >> Successfully built > > >> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl > > >> > > >> That's it. That also allows cases like installing provider packages > using > > >> git URLs - which I used earlier today to test if the incoming PR of > > >> pygments is actually solving the problem we had yesteday > > >> https://github.com/apache/airflow/pull/45416 (basically we just > make our > > >> provider packages "standard" python packages that all the tools > support. > > >> Anyone who would like to install a commit, hash or branch version of > the > > >> "airbyte" package from main version of Airflow repo will be able to > do: > > >> > > >> pip install "apache-airflow-providers-airbyte @ git+ > > >> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID" > > >> > > >> Currently in order to create the package we need to manually extract > the > > >> "amazon" subtree, copy it elsewhere, prepare dynamically some files > > >> (pyproject.toml, README.rst and few others) and only then we build > the > > >> package. All this - copying file structure, creating new files, > running the > > >> build command after and finally deleting the copied files is now - > > >> dynamically and under-the-hood created by "breeze release-management > > >> prepare-provider-packages" command. With this change, the directory > > >> structure in `git` repo of ours is totally standard and allows us (and > > >> anyone else) to build the package directly from it. > > >> > > >> > > >> And what is the plan for system tests? As part of this > reorganization, > > >>> could they be moved into providers/{PROVIDER_ID}/tests/system? That > seems > > >>> more intuitive to me than their current location in > > >>> providers/tests/system/{PROVIDER_ID}/example_foo.py. > > >>> > > >>> > > >> Oh yeah - I missed that in the original structure as the "airbyte" > > >> provider (that I chose as first one) did not contain the "system" > tests - > > >> but one of the two providers after that i was planning to make sure > system > > >> tests are covered. They are supposed to be moved to "tests/system" of > > >> course - Elad had similar question and I also explained it in detail > in > > >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > >> > > >> > > >> I hope it answers the questions. If not - I am happy to add more > > >> clarifications :) > > >> > > >> > > >>> J. > > >>> > > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >