Is there anything else that "tastes" Ash ? A concrete reason that makes you think the "-" prefix in this case is better than the "/" folder? How strong is your "taste" preference and do you think it will have some lasting effect if we choose to flatten the folder structure?
I might make a small vote to see what is the preference of people if we think this is an important aspect. BTW. This is why I really love black/ruff formatting - we stopped wasting time on "taste" discussion - it does not matter what is the individual preference, consistency is more important and I prefer to do stuff that really matters but if people feel strongly that we should discuss it, I might make a vote there. J. On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor <a...@apache.org> wrote: > My preference is for being “more direct” and not having deeply nested > things where possible — I think Microsoft might be the one case where > having extra folders makes sense. And I’m fine with things not being > consistent across providers/groups of providers. > > -ash > > > > > On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > Can you give an example of what might break why having > > `providers/aapche-beam/src/airflow/providers/apache/beam`? > > > > Nothing will break. It's just: > > > > * the code will have to be a little more complex as it will have to do > some > > conditional writes of "-" "/" > > * there will be inconsistency in the depth of folders - outside it will > be > > 1, inside it will be 2 (as it is in your example)/ > > * it will be a bit more convention/ complex to limit related providers > (say > > microsoft) - with the current scheme "providers/microsoft" is the > directory > > containing all microsoft providers. If we change it to "-", you have to > > find all sub-directories following "microsoft-*" convention. > > > > I am not super-strong on it - we could do either, it's just my preference > > to use folders for grouping related things (as folders were designed > for). > > > > J. > > > > On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > >>> And we already have a number of mappings and conventions to handle > that. > >>> For example provider I'd mapping to dirs (apache.beam -> apache/beam), > >> and > >>> 'apache-airflow-providers-apache-beam' as package na e and > >>> airflow/providers/apache/beam as packages inside the distribution. > Those > >>> will remain as they are - we cannot change them without breaking > >>> compatibility. > >> > >> Can you give an example of what might break why having > >> `providers/aapche-beam/src/airflow/providers/apache/beam`? > >> > >> -a > >> > >>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote: > >>> > >>> I think it will be better to keep it. > >>> > >>> The reason we have varying levels were to group things together - > mainly > >>> Apache related providers, but also Microsoft. > >>> > >>> And we already have a number of mappings and conventions to handle > that. > >>> For example provider I'd mapping to dirs (apache.beam -> apache/beam), > >> and > >>> 'apache-airflow-providers-apache-beam' as package na e and > >>> airflow/providers/apache/beam as packages inside the distribution. > Those > >>> will remain as they are - we cannot change them without breaking > >>> compatibility. > >>> > >>> So if we change it to a flat structure we will have some > inconsistencies > >> - > >>> in some cases it will be single folder in others (packages) those will > be > >>> two folders. > >>> > >>> I think it will be more harm than good if we get rid of the 'folder' > >>> structures - some of the code in breeze will have to treat those > >>> differently as well. Nothing extraordinary and very complex but more > >>> complex-ish than it should be - already on top of handling potentially > >>> nested folders > >>> > >>> So my preference would be to stay with apache/beam - it's just more > >>> consistently handling the case where provider packages can be one-level > >>> nested > >>> > >>> J > >>> > >>> > >>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org> > >>> napisał: > >>> > >>>> Good question. I always found it confusing to have some providers at > >>>> different level. Examples: > >>>> - "airbyte" in "providers" directory (I would qualify it as "regular" > >>>> provider) > >>>> - "hive" in "providers/apache" > >>>> - "amazon" in "providers" but which contains only one sub directory > >> "aws" > >>>> > >>>> I would be in favor of using "-" instead of "/" so that all providers > >> are > >>>> at the same level. > >>>> > >>>> > >>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote: > >>>>> +1 one to this on general terms, it will hopefully reduce a lot of > the > >>>> boilerplate we need. > >>>>> > >>>>> As for the amazon/aws example specifically that does bring up a > >>>> question, should we have `/` or `-`.. to give some examples: > >>>>> > >>>>> cncf kubernetes: ./providers/cncf/kubernetes or > >>>> ./providers/cncf-kubernetes > >>>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive > >>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws > >>>>> > >>>>> There is no requirement from python etc on one form or the other (as > >>>> it’s just a folder, not part of the module name), so it’s what ever > >> makes > >>>> most sense to us. > >>>>> > >>>>> Jarek and Dennis (and others): what are your preferences on these > >> styles? > >>>>> > >>>>> -ash > >>>>> > >>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote: > >>>>>> > >>>>>> Oh. . And one other benefit of it will be that we will be able to > get > >>>> rid > >>>>>> of about 40% of the "Providers Manager" code. Currently, in > Providers > >>>>>> manager we have a lot of "ifs" that make it possible to use > providers > >>>> in > >>>>>> breeze and local environment from the sources. In "production" > >>>> installation > >>>>>> we are using "get_provider_info" entry points to discover providers > >>>> and > >>>>>> discover if provider is installed, but when you use current > providers > >>>>>> installed in Breeze to inside "airflow", we rely on `provider.yaml` > to > >>>> be > >>>>>> present in the "airflow.providers.PROVIDER_ID" path - so we > >> effectively > >>>>>> have two paths of discovering information about the providers > >>>> installed. > >>>>>> > >>>>>> After all providers are migrated to the new structure, all providers > >>>> are > >>>>>> separate "distributions" - and when you run `uv sync` (which will > >>>> install > >>>>>> all providers thanks to workspace feature) or `pip install -e > >>>>>> ./providers/aws` (which you will have to do manually to work on the > >>>>>> provider - if you use `pip` rather than uv) - then we will not have > to > >>>> use > >>>>>> the separate path to read provider.yaml, because the right > entrypoint > >>>> for > >>>>>> the provider will be installed as well - so we will be able to get > rid > >>>> of > >>>>>> quite some code that is currently only used in airflow development > >>>>>> environment. > >>>>>> > >>>>>> J. > >>>>>> > >>>>>> > >>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> > >> wrote: > >>>>>> > >>>>>>> Those are very good questions :) > >>>>>>> > >>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis > >>>>>>> <ferru...@amazon.com.invalid> wrote: > >>>>>>> > >>>>>>>> To clarify that I understand your diagram correctly, let's say you > >>>> clone > >>>>>>>> the Airflow repo to ~/workspace/airflow/. Does this mean that the > >>>> AWS Glue > >>>>>>>> Hook which used to live at > >>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a > random > >>>>>>>> example) will be located at > >>>>>>>> > >>>> > >> > ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? > >>>>>>>> That feels unnecessarily repetitive to me, maybe it makes sense > but > >>>> I'm > >>>>>>>> missing the context? > >>>>>>>> > >>>>>>> > >>>>>>> Yes - it means that there is this repetitiveness but for a good > >>>> reason - > >>>>>>> those two "amazon/aws" serve different purpose: > >>>>>>> > >>>>>>> * The first "providers/amazon/aws" is just where the whole provider > >>>>>>> "complete project" is stored - it's basically a directory where > "aws > >>>>>>> provider" is stored, a convenient folder to locate it in, that > makes > >>>> it > >>>>>>> separate from other providers > >>>>>>> * The second "src/airflow/providers/amazon/aws" - is the python > >>>>>>> package where the source files is stored - this is how (inside the > >>>>>>> sub-folder) you tell the actual python "import" to look for the > >>>> sources. > >>>>>>> > >>>>>>> .What really matters is that (eventually) > >>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a > >>>> completely > >>>>>>> separate python project - a source of a "standalone" provider > python > >>>>>>> project. > >>>>>>> There is a "pyproject.toml" file at the root of it and if you do > this > >>>> (for > >>>>>>> example): > >>>>>>> > >>>>>>> cd providers/amazon/aws/ > >>>>>>> uv sync > >>>>>>> > >>>>>>> And with it you will be able to work on AWS provider exclusively > as a > >>>>>>> separate project (this is not yet complete with the move - tests > are > >>>> not > >>>>>>> entirely possible to run today - but it will be possible as next > step > >>>> - I > >>>>>>> explained it in > >>>>>>> > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > >>>>>>> > >>>>>>> This has a number of benefits, but one of them is that you will be > >>>> able to > >>>>>>> build provider by just running `build` command of your favourite > >>>>>>> PEP-standard compliant frontend: > >>>>>>> > >>>>>>> cd providers/amazon/aws/ > >>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build` > ).... > >>>>>>> > >>>>>>> This will create the provider package inside the `dist" folder. I > >>>> just > >>>>>>> did it in my PR with `uv` in the first "airbyte` project: > >>>>>>> > >>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build > >>>>>>> Building source distribution... > >>>>>>> Building wheel from source distribution... > >>>>>>> Successfully built > dist/apache_airflow_providers_airbyte-5.0.0.tar.gz > >>>>>>> Successfully built > >>>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl > >>>>>>> > >>>>>>> That's it. That also allows cases like installing provider packages > >>>> using > >>>>>>> git URLs - which I used earlier today to test if the incoming PR of > >>>>>>> pygments is actually solving the problem we had yesteday > >>>>>>> https://github.com/apache/airflow/pull/45416 (basically we just > >>>> make our > >>>>>>> provider packages "standard" python packages that all the tools > >>>> support. > >>>>>>> Anyone who would like to install a commit, hash or branch version > of > >>>> the > >>>>>>> "airbyte" package from main version of Airflow repo will be able to > >>>> do: > >>>>>>> > >>>>>>> pip install "apache-airflow-providers-airbyte @ git+ > >>>>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID" > >>>>>>> > >>>>>>> Currently in order to create the package we need to manually > extract > >>>> the > >>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files > >>>>>>> (pyproject.toml, README.rst and few others) and only then we build > >>>> the > >>>>>>> package. All this - copying file structure, creating new files, > >>>> running the > >>>>>>> build command after and finally deleting the copied files is now - > >>>>>>> dynamically and under-the-hood created by "breeze > release-management > >>>>>>> prepare-provider-packages" command. With this change, the directory > >>>>>>> structure in `git` repo of ours is totally standard and allows us > >> (and > >>>>>>> anyone else) to build the package directly from it. > >>>>>>> > >>>>>>> > >>>>>>> And what is the plan for system tests? As part of this > >>>> reorganization, > >>>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system? > That > >>>> seems > >>>>>>>> more intuitive to me than their current location in > >>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py. > >>>>>>>> > >>>>>>>> > >>>>>>> Oh yeah - I missed that in the original structure as the "airbyte" > >>>>>>> provider (that I chose as first one) did not contain the "system" > >>>> tests - > >>>>>>> but one of the two providers after that i was planning to make sure > >>>> system > >>>>>>> tests are covered. They are supposed to be moved to "tests/system" > of > >>>>>>> course - Elad had similar question and I also explained it in > detail > >>>> in > >>>>>>> > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > >>>>>>> > >>>>>>> > >>>>>>> I hope it answers the questions. If not - I am happy to add more > >>>>>>> clarifications :) > >>>>>>> > >>>>>>> > >>>>>>>> J. > >>>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>> > >>>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> For additional commands, e-mail: dev-h...@airflow.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >