Also - one more thing.. Doing this migration to the new structure- we could *potrentially* use that opportunity to discontinue some providers. There are quite likely quite a few that make very little point to maintain any more and we could use the Airflow 3 release opportunity to say "those providers will not be maintained any more". I think - as long as we vote on it - there is no problem in deciding it - we could even do it in the way that if **SOMEONE** from the community is very much interested in keeping particular providers maintained, they could volunteer and either lead moving of the provider to the new structure, or maybe even forks it and decides to release on their own. We have quite a few providers there that are pretty "proprietary" in nature and maybe it would be a great idea to use that move opportunity to test the water and see how dedicated the "stakeholders" in those providers are and maybe whether they would like to take them over. And as a community we might still make a decision "we do not want to maintain this provider any more" as well.
J On Wed, Jan 8, 2025 at 12:31 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > Since each provider will self contained, it will be easier for isolation > and moving providers one-by-one to Airflow 3. > > On Wed, 8 Jan 2025 at 00:04, Jarek Potiuk <ja...@potiuk.com> wrote: > > > I think it will be better to keep it. > > > > The reason we have varying levels were to group things together - mainly > > Apache related providers, but also Microsoft. > > > > And we already have a number of mappings and conventions to handle that. > > For example provider I'd mapping to dirs (apache.beam -> apache/beam), > and > > 'apache-airflow-providers-apache-beam' as package na e and > > airflow/providers/apache/beam as packages inside the distribution. Those > > will remain as they are - we cannot change them without breaking > > compatibility. > > > > So if we change it to a flat structure we will have some inconsistencies > - > > in some cases it will be single folder in others (packages) those will be > > two folders. > > > > I think it will be more harm than good if we get rid of the 'folder' > > structures - some of the code in breeze will have to treat those > > differently as well. Nothing extraordinary and very complex but more > > complex-ish than it should be - already on top of handling potentially > > nested folders > > > > So my preference would be to stay with apache/beam - it's just more > > consistently handling the case where provider packages can be one-level > > nested > > > > J > > > > > > wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org> > > napisał: > > > > > Good question. I always found it confusing to have some providers at > > > different level. Examples: > > > - "airbyte" in "providers" directory (I would qualify it as "regular" > > > provider) > > > - "hive" in "providers/apache" > > > - "amazon" in "providers" but which contains only one sub directory > "aws" > > > > > > I would be in favor of using "-" instead of "/" so that all providers > are > > > at the same level. > > > > > > > > > On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote: > > > > +1 one to this on general terms, it will hopefully reduce a lot of > the > > > boilerplate we need. > > > > > > > > As for the amazon/aws example specifically that does bring up a > > > question, should we have `/` or `-`.. to give some examples: > > > > > > > > cncf kubernetes: ./providers/cncf/kubernetes or > > > ./providers/cncf-kubernetes > > > > Apache hive: ./providers/apache/hive or ./providers/apache-hive > > > > AWS: ./providers/amazon/aws or ./providers/amazon-aws > > > > > > > > There is no requirement from python etc on one form or the other (as > > > it’s just a folder, not part of the module name), so it’s what ever > makes > > > most sense to us. > > > > > > > > Jarek and Dennis (and others): what are your preferences on these > > styles? > > > > > > > > -ash > > > > > > > > > On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > > > > Oh. . And one other benefit of it will be that we will be able to > get > > > rid > > > > > of about 40% of the "Providers Manager" code. Currently, in > Providers > > > > > manager we have a lot of "ifs" that make it possible to use > providers > > > in > > > > > breeze and local environment from the sources. In "production" > > > installation > > > > > we are using "get_provider_info" entry points to discover > providers > > > and > > > > > discover if provider is installed, but when you use current > providers > > > > > installed in Breeze to inside "airflow", we rely on `provider.yaml` > > to > > > be > > > > > present in the "airflow.providers.PROVIDER_ID" path - so we > > effectively > > > > > have two paths of discovering information about the providers > > > installed. > > > > > > > > > > After all providers are migrated to the new structure, all > providers > > > are > > > > > separate "distributions" - and when you run `uv sync` (which will > > > install > > > > > all providers thanks to workspace feature) or `pip install -e > > > > > ./providers/aws` (which you will have to do manually to work on the > > > > > provider - if you use `pip` rather than uv) - then we will not have > > to > > > use > > > > > the separate path to read provider.yaml, because the right > entrypoint > > > for > > > > > the provider will be installed as well - so we will be able to get > > rid > > > of > > > > > quite some code that is currently only used in airflow development > > > > > environment. > > > > > > > > > > J. > > > > > > > > > > > > > > > On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> > > wrote: > > > > > > > > > >> Those are very good questions :) > > > > >> > > > > >> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis > > > > >> <ferru...@amazon.com.invalid> wrote: > > > > >> > > > > >>> To clarify that I understand your diagram correctly, let's say > you > > > clone > > > > >>> the Airflow repo to ~/workspace/airflow/. Does this mean that > the > > > AWS Glue > > > > >>> Hook which used to live at > > > > >>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a > random > > > > >>> example) will be located at > > > > >>> > > > > > > ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? > > > > >>> That feels unnecessarily repetitive to me, maybe it makes sense > but > > > I'm > > > > >>> missing the context? > > > > >>> > > > > >> > > > > >> Yes - it means that there is this repetitiveness but for a good > > > reason - > > > > >> those two "amazon/aws" serve different purpose: > > > > >> > > > > >> * The first "providers/amazon/aws" is just where the whole > provider > > > > >> "complete project" is stored - it's basically a directory where > "aws > > > > >> provider" is stored, a convenient folder to locate it in, that > makes > > > it > > > > >> separate from other providers > > > > >> * The second "src/airflow/providers/amazon/aws" - is the python > > > > >> package where the source files is stored - this is how (inside the > > > > >> sub-folder) you tell the actual python "import" to look for the > > > sources. > > > > >> > > > > >> .What really matters is that (eventually) > > > > >> `~/workspace/airflow/providers/amazon/aws/` can be treated as a > > > completely > > > > >> separate python project - a source of a "standalone" provider > python > > > > >> project. > > > > >> There is a "pyproject.toml" file at the root of it and if you do > > this > > > (for > > > > >> example): > > > > >> > > > > >> cd providers/amazon/aws/ > > > > >> uv sync > > > > >> > > > > >> And with it you will be able to work on AWS provider exclusively > as > > a > > > > >> separate project (this is not yet complete with the move - tests > are > > > not > > > > >> entirely possible to run today - but it will be possible as next > > step > > > - I > > > > >> explained it in > > > > >> > > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > > > >> > > > > >> This has a number of benefits, but one of them is that you will be > > > able to > > > > >> build provider by just running `build` command of your favourite > > > > >> PEP-standard compliant frontend: > > > > >> > > > > >> cd providers/amazon/aws/ > > > > >> `uv build` (or `hatch build` or `poetry build` or `flit build` > ).... > > > > >> > > > > >> This will create the provider package inside the `dist" folder. I > > > just > > > > >> did it in my PR with `uv` in the first "airbyte` project: > > > > >> > > > > >> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build > > > > >> Building source distribution... > > > > >> Building wheel from source distribution... > > > > >> Successfully built > > dist/apache_airflow_providers_airbyte-5.0.0.tar.gz > > > > >> Successfully built > > > > >> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl > > > > >> > > > > >> That's it. That also allows cases like installing provider > packages > > > using > > > > >> git URLs - which I used earlier today to test if the incoming PR > of > > > > >> pygments is actually solving the problem we had yesteday > > > > >> https://github.com/apache/airflow/pull/45416 (basically we just > > > make our > > > > >> provider packages "standard" python packages that all the tools > > > support. > > > > >> Anyone who would like to install a commit, hash or branch version > of > > > the > > > > >> "airbyte" package from main version of Airflow repo will be able > to > > > do: > > > > >> > > > > >> pip install "apache-airflow-providers-airbyte @ git+ > > > > >> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID > " > > > > >> > > > > >> Currently in order to create the package we need to manually > extract > > > the > > > > >> "amazon" subtree, copy it elsewhere, prepare dynamically some > files > > > > >> (pyproject.toml, README.rst and few others) and only then we > build > > > the > > > > >> package. All this - copying file structure, creating new files, > > > running the > > > > >> build command after and finally deleting the copied files is now - > > > > >> dynamically and under-the-hood created by "breeze > release-management > > > > >> prepare-provider-packages" command. With this change, the > directory > > > > >> structure in `git` repo of ours is totally standard and allows us > > (and > > > > >> anyone else) to build the package directly from it. > > > > >> > > > > >> > > > > >> And what is the plan for system tests? As part of this > > > reorganization, > > > > >>> could they be moved into providers/{PROVIDER_ID}/tests/system? > > That > > > seems > > > > >>> more intuitive to me than their current location in > > > > >>> providers/tests/system/{PROVIDER_ID}/example_foo.py. > > > > >>> > > > > >>> > > > > >> Oh yeah - I missed that in the original structure as the "airbyte" > > > > >> provider (that I chose as first one) did not contain the "system" > > > tests - > > > > >> but one of the two providers after that i was planning to make > sure > > > system > > > > >> tests are covered. They are supposed to be moved to "tests/system" > > of > > > > >> course - Elad had similar question and I also explained it in > detail > > > in > > > > >> > > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > > > >> > > > > >> > > > > >> I hope it answers the questions. If not - I am happy to add more > > > > >> clarifications :) > > > > >> > > > > >> > > > > >>> J. > > > > >>> > > > > >> > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > >