> And we already have a number of mappings and conventions to handle that. > For example provider I'd mapping to dirs (apache.beam -> apache/beam), and > 'apache-airflow-providers-apache-beam' as package na e and > airflow/providers/apache/beam as packages inside the distribution. Those > will remain as they are - we cannot change them without breaking > compatibility.
Can you give an example of what might break why having `providers/aapche-beam/src/airflow/providers/apache/beam`? -a > On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote: > > I think it will be better to keep it. > > The reason we have varying levels were to group things together - mainly > Apache related providers, but also Microsoft. > > And we already have a number of mappings and conventions to handle that. > For example provider I'd mapping to dirs (apache.beam -> apache/beam), and > 'apache-airflow-providers-apache-beam' as package na e and > airflow/providers/apache/beam as packages inside the distribution. Those > will remain as they are - we cannot change them without breaking > compatibility. > > So if we change it to a flat structure we will have some inconsistencies - > in some cases it will be single folder in others (packages) those will be > two folders. > > I think it will be more harm than good if we get rid of the 'folder' > structures - some of the code in breeze will have to treat those > differently as well. Nothing extraordinary and very complex but more > complex-ish than it should be - already on top of handling potentially > nested folders > > So my preference would be to stay with apache/beam - it's just more > consistently handling the case where provider packages can be one-level > nested > > J > > > wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org> > napisał: > >> Good question. I always found it confusing to have some providers at >> different level. Examples: >> - "airbyte" in "providers" directory (I would qualify it as "regular" >> provider) >> - "hive" in "providers/apache" >> - "amazon" in "providers" but which contains only one sub directory "aws" >> >> I would be in favor of using "-" instead of "/" so that all providers are >> at the same level. >> >> >> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote: >>> +1 one to this on general terms, it will hopefully reduce a lot of the >> boilerplate we need. >>> >>> As for the amazon/aws example specifically that does bring up a >> question, should we have `/` or `-`.. to give some examples: >>> >>> cncf kubernetes: ./providers/cncf/kubernetes or >> ./providers/cncf-kubernetes >>> Apache hive: ./providers/apache/hive or ./providers/apache-hive >>> AWS: ./providers/amazon/aws or ./providers/amazon-aws >>> >>> There is no requirement from python etc on one form or the other (as >> it’s just a folder, not part of the module name), so it’s what ever makes >> most sense to us. >>> >>> Jarek and Dennis (and others): what are your preferences on these styles? >>> >>> -ash >>> >>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote: >>>> >>>> Oh. . And one other benefit of it will be that we will be able to get >> rid >>>> of about 40% of the "Providers Manager" code. Currently, in Providers >>>> manager we have a lot of "ifs" that make it possible to use providers >> in >>>> breeze and local environment from the sources. In "production" >> installation >>>> we are using "get_provider_info" entry points to discover providers >> and >>>> discover if provider is installed, but when you use current providers >>>> installed in Breeze to inside "airflow", we rely on `provider.yaml` to >> be >>>> present in the "airflow.providers.PROVIDER_ID" path - so we effectively >>>> have two paths of discovering information about the providers >> installed. >>>> >>>> After all providers are migrated to the new structure, all providers >> are >>>> separate "distributions" - and when you run `uv sync` (which will >> install >>>> all providers thanks to workspace feature) or `pip install -e >>>> ./providers/aws` (which you will have to do manually to work on the >>>> provider - if you use `pip` rather than uv) - then we will not have to >> use >>>> the separate path to read provider.yaml, because the right entrypoint >> for >>>> the provider will be installed as well - so we will be able to get rid >> of >>>> quite some code that is currently only used in airflow development >>>> environment. >>>> >>>> J. >>>> >>>> >>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>>> >>>>> Those are very good questions :) >>>>> >>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis >>>>> <ferru...@amazon.com.invalid> wrote: >>>>> >>>>>> To clarify that I understand your diagram correctly, let's say you >> clone >>>>>> the Airflow repo to ~/workspace/airflow/. Does this mean that the >> AWS Glue >>>>>> Hook which used to live at >>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a random >>>>>> example) will be located at >>>>>> >> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? >>>>>> That feels unnecessarily repetitive to me, maybe it makes sense but >> I'm >>>>>> missing the context? >>>>>> >>>>> >>>>> Yes - it means that there is this repetitiveness but for a good >> reason - >>>>> those two "amazon/aws" serve different purpose: >>>>> >>>>> * The first "providers/amazon/aws" is just where the whole provider >>>>> "complete project" is stored - it's basically a directory where "aws >>>>> provider" is stored, a convenient folder to locate it in, that makes >> it >>>>> separate from other providers >>>>> * The second "src/airflow/providers/amazon/aws" - is the python >>>>> package where the source files is stored - this is how (inside the >>>>> sub-folder) you tell the actual python "import" to look for the >> sources. >>>>> >>>>> .What really matters is that (eventually) >>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a >> completely >>>>> separate python project - a source of a "standalone" provider python >>>>> project. >>>>> There is a "pyproject.toml" file at the root of it and if you do this >> (for >>>>> example): >>>>> >>>>> cd providers/amazon/aws/ >>>>> uv sync >>>>> >>>>> And with it you will be able to work on AWS provider exclusively as a >>>>> separate project (this is not yet complete with the move - tests are >> not >>>>> entirely possible to run today - but it will be possible as next step >> - I >>>>> explained it in >>>>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 >>>>> >>>>> This has a number of benefits, but one of them is that you will be >> able to >>>>> build provider by just running `build` command of your favourite >>>>> PEP-standard compliant frontend: >>>>> >>>>> cd providers/amazon/aws/ >>>>> `uv build` (or `hatch build` or `poetry build` or `flit build` ).... >>>>> >>>>> This will create the provider package inside the `dist" folder. I >> just >>>>> did it in my PR with `uv` in the first "airbyte` project: >>>>> >>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build >>>>> Building source distribution... >>>>> Building wheel from source distribution... >>>>> Successfully built dist/apache_airflow_providers_airbyte-5.0.0.tar.gz >>>>> Successfully built >>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl >>>>> >>>>> That's it. That also allows cases like installing provider packages >> using >>>>> git URLs - which I used earlier today to test if the incoming PR of >>>>> pygments is actually solving the problem we had yesteday >>>>> https://github.com/apache/airflow/pull/45416 (basically we just >> make our >>>>> provider packages "standard" python packages that all the tools >> support. >>>>> Anyone who would like to install a commit, hash or branch version of >> the >>>>> "airbyte" package from main version of Airflow repo will be able to >> do: >>>>> >>>>> pip install "apache-airflow-providers-airbyte @ git+ >>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID" >>>>> >>>>> Currently in order to create the package we need to manually extract >> the >>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files >>>>> (pyproject.toml, README.rst and few others) and only then we build >> the >>>>> package. All this - copying file structure, creating new files, >> running the >>>>> build command after and finally deleting the copied files is now - >>>>> dynamically and under-the-hood created by "breeze release-management >>>>> prepare-provider-packages" command. With this change, the directory >>>>> structure in `git` repo of ours is totally standard and allows us (and >>>>> anyone else) to build the package directly from it. >>>>> >>>>> >>>>> And what is the plan for system tests? As part of this >> reorganization, >>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system? That >> seems >>>>>> more intuitive to me than their current location in >>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py. >>>>>> >>>>>> >>>>> Oh yeah - I missed that in the original structure as the "airbyte" >>>>> provider (that I chose as first one) did not contain the "system" >> tests - >>>>> but one of the two providers after that i was planning to make sure >> system >>>>> tests are covered. They are supposed to be moved to "tests/system" of >>>>> course - Elad had similar question and I also explained it in detail >> in >>>>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 >>>>> >>>>> >>>>> I hope it answers the questions. If not - I am happy to add more >>>>> clarifications :) >>>>> >>>>> >>>>>> J. >>>>>> >>>>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org