So I propose letting the "doer" make the decision if we are split.
On Fri, Jan 10, 2025 at 11:52 AM Ash Berlin-Taylor <a...@apache.org> wrote: > Not strong at all, preference Is all. It sounds like Vincent and I are in > the hyphen camp and you and Maciej are in the slash camp. > > +1 on the “I don’t care what code style is used as long as it is > programmatically enforced”. > > -a > > > On 10 Jan 2025, at 09:41, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > Is there anything else that "tastes" Ash ? A concrete reason that makes > you > > think the "-" prefix in this case is better than the "/" folder? How > > strong is your "taste" preference and do you think it will have some > > lasting effect if we choose to flatten the folder structure? > > > > I might make a small vote to see what is the preference of people if we > > think this is an important aspect. > > > > BTW. This is why I really love black/ruff formatting - we stopped wasting > > time on "taste" discussion - it does not matter what is the individual > > preference, consistency is more important and I prefer to do stuff that > > really matters but if people feel strongly that we should discuss it, I > > might make a vote there. > > > > J. > > > > On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > >> My preference is for being “more direct” and not having deeply nested > >> things where possible — I think Microsoft might be the one case where > >> having extra folders makes sense. And I’m fine with things not being > >> consistent across providers/groups of providers. > >> > >> -ash > >> > >> > >> > >>> On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> wrote: > >>> > >>> Can you give an example of what might break why having > >>> `providers/aapche-beam/src/airflow/providers/apache/beam`? > >>> > >>> Nothing will break. It's just: > >>> > >>> * the code will have to be a little more complex as it will have to do > >> some > >>> conditional writes of "-" "/" > >>> * there will be inconsistency in the depth of folders - outside it will > >> be > >>> 1, inside it will be 2 (as it is in your example)/ > >>> * it will be a bit more convention/ complex to limit related providers > >> (say > >>> microsoft) - with the current scheme "providers/microsoft" is the > >> directory > >>> containing all microsoft providers. If we change it to "-", you have to > >>> find all sub-directories following "microsoft-*" convention. > >>> > >>> I am not super-strong on it - we could do either, it's just my > preference > >>> to use folders for grouping related things (as folders were designed > >> for). > >>> > >>> J. > >>> > >>> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org> > wrote: > >>> > >>>>> And we already have a number of mappings and conventions to handle > >> that. > >>>>> For example provider I'd mapping to dirs (apache.beam -> > apache/beam), > >>>> and > >>>>> 'apache-airflow-providers-apache-beam' as package na e and > >>>>> airflow/providers/apache/beam as packages inside the distribution. > >> Those > >>>>> will remain as they are - we cannot change them without breaking > >>>>> compatibility. > >>>> > >>>> Can you give an example of what might break why having > >>>> `providers/aapche-beam/src/airflow/providers/apache/beam`? > >>>> > >>>> -a > >>>> > >>>>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote: > >>>>> > >>>>> I think it will be better to keep it. > >>>>> > >>>>> The reason we have varying levels were to group things together - > >> mainly > >>>>> Apache related providers, but also Microsoft. > >>>>> > >>>>> And we already have a number of mappings and conventions to handle > >> that. > >>>>> For example provider I'd mapping to dirs (apache.beam -> > apache/beam), > >>>> and > >>>>> 'apache-airflow-providers-apache-beam' as package na e and > >>>>> airflow/providers/apache/beam as packages inside the distribution. > >> Those > >>>>> will remain as they are - we cannot change them without breaking > >>>>> compatibility. > >>>>> > >>>>> So if we change it to a flat structure we will have some > >> inconsistencies > >>>> - > >>>>> in some cases it will be single folder in others (packages) those > will > >> be > >>>>> two folders. > >>>>> > >>>>> I think it will be more harm than good if we get rid of the 'folder' > >>>>> structures - some of the code in breeze will have to treat those > >>>>> differently as well. Nothing extraordinary and very complex but more > >>>>> complex-ish than it should be - already on top of handling > potentially > >>>>> nested folders > >>>>> > >>>>> So my preference would be to stay with apache/beam - it's just more > >>>>> consistently handling the case where provider packages can be > one-level > >>>>> nested > >>>>> > >>>>> J > >>>>> > >>>>> > >>>>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org> > >>>>> napisał: > >>>>> > >>>>>> Good question. I always found it confusing to have some providers at > >>>>>> different level. Examples: > >>>>>> - "airbyte" in "providers" directory (I would qualify it as > "regular" > >>>>>> provider) > >>>>>> - "hive" in "providers/apache" > >>>>>> - "amazon" in "providers" but which contains only one sub directory > >>>> "aws" > >>>>>> > >>>>>> I would be in favor of using "-" instead of "/" so that all > providers > >>>> are > >>>>>> at the same level. > >>>>>> > >>>>>> > >>>>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote: > >>>>>>> +1 one to this on general terms, it will hopefully reduce a lot of > >> the > >>>>>> boilerplate we need. > >>>>>>> > >>>>>>> As for the amazon/aws example specifically that does bring up a > >>>>>> question, should we have `/` or `-`.. to give some examples: > >>>>>>> > >>>>>>> cncf kubernetes: ./providers/cncf/kubernetes or > >>>>>> ./providers/cncf-kubernetes > >>>>>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive > >>>>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws > >>>>>>> > >>>>>>> There is no requirement from python etc on one form or the other > (as > >>>>>> it’s just a folder, not part of the module name), so it’s what ever > >>>> makes > >>>>>> most sense to us. > >>>>>>> > >>>>>>> Jarek and Dennis (and others): what are your preferences on these > >>>> styles? > >>>>>>> > >>>>>>> -ash > >>>>>>> > >>>>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote: > >>>>>>>> > >>>>>>>> Oh. . And one other benefit of it will be that we will be able to > >> get > >>>>>> rid > >>>>>>>> of about 40% of the "Providers Manager" code. Currently, in > >> Providers > >>>>>>>> manager we have a lot of "ifs" that make it possible to use > >> providers > >>>>>> in > >>>>>>>> breeze and local environment from the sources. In "production" > >>>>>> installation > >>>>>>>> we are using "get_provider_info" entry points to discover > providers > >>>>>> and > >>>>>>>> discover if provider is installed, but when you use current > >> providers > >>>>>>>> installed in Breeze to inside "airflow", we rely on > `provider.yaml` > >> to > >>>>>> be > >>>>>>>> present in the "airflow.providers.PROVIDER_ID" path - so we > >>>> effectively > >>>>>>>> have two paths of discovering information about the providers > >>>>>> installed. > >>>>>>>> > >>>>>>>> After all providers are migrated to the new structure, all > providers > >>>>>> are > >>>>>>>> separate "distributions" - and when you run `uv sync` (which will > >>>>>> install > >>>>>>>> all providers thanks to workspace feature) or `pip install -e > >>>>>>>> ./providers/aws` (which you will have to do manually to work on > the > >>>>>>>> provider - if you use `pip` rather than uv) - then we will not > have > >> to > >>>>>> use > >>>>>>>> the separate path to read provider.yaml, because the right > >> entrypoint > >>>>>> for > >>>>>>>> the provider will be installed as well - so we will be able to get > >> rid > >>>>>> of > >>>>>>>> quite some code that is currently only used in airflow development > >>>>>>>> environment. > >>>>>>>> > >>>>>>>> J. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> > >>>> wrote: > >>>>>>>> > >>>>>>>>> Those are very good questions :) > >>>>>>>>> > >>>>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis > >>>>>>>>> <ferru...@amazon.com.invalid> wrote: > >>>>>>>>> > >>>>>>>>>> To clarify that I understand your diagram correctly, let's say > you > >>>>>> clone > >>>>>>>>>> the Airflow repo to ~/workspace/airflow/. Does this mean that > the > >>>>>> AWS Glue > >>>>>>>>>> Hook which used to live at > >>>>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a > >> random > >>>>>>>>>> example) will be located at > >>>>>>>>>> > >>>>>> > >>>> > >> > ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? > >>>>>>>>>> That feels unnecessarily repetitive to me, maybe it makes sense > >> but > >>>>>> I'm > >>>>>>>>>> missing the context? > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Yes - it means that there is this repetitiveness but for a good > >>>>>> reason - > >>>>>>>>> those two "amazon/aws" serve different purpose: > >>>>>>>>> > >>>>>>>>> * The first "providers/amazon/aws" is just where the whole > provider > >>>>>>>>> "complete project" is stored - it's basically a directory where > >> "aws > >>>>>>>>> provider" is stored, a convenient folder to locate it in, that > >> makes > >>>>>> it > >>>>>>>>> separate from other providers > >>>>>>>>> * The second "src/airflow/providers/amazon/aws" - is the python > >>>>>>>>> package where the source files is stored - this is how (inside > the > >>>>>>>>> sub-folder) you tell the actual python "import" to look for the > >>>>>> sources. > >>>>>>>>> > >>>>>>>>> .What really matters is that (eventually) > >>>>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a > >>>>>> completely > >>>>>>>>> separate python project - a source of a "standalone" provider > >> python > >>>>>>>>> project. > >>>>>>>>> There is a "pyproject.toml" file at the root of it and if you do > >> this > >>>>>> (for > >>>>>>>>> example): > >>>>>>>>> > >>>>>>>>> cd providers/amazon/aws/ > >>>>>>>>> uv sync > >>>>>>>>> > >>>>>>>>> And with it you will be able to work on AWS provider exclusively > >> as a > >>>>>>>>> separate project (this is not yet complete with the move - tests > >> are > >>>>>> not > >>>>>>>>> entirely possible to run today - but it will be possible as next > >> step > >>>>>> - I > >>>>>>>>> explained it in > >>>>>>>>> > >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > >>>>>>>>> > >>>>>>>>> This has a number of benefits, but one of them is that you will > be > >>>>>> able to > >>>>>>>>> build provider by just running `build` command of your favourite > >>>>>>>>> PEP-standard compliant frontend: > >>>>>>>>> > >>>>>>>>> cd providers/amazon/aws/ > >>>>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build` > >> ).... > >>>>>>>>> > >>>>>>>>> This will create the provider package inside the `dist" folder. > I > >>>>>> just > >>>>>>>>> did it in my PR with `uv` in the first "airbyte` project: > >>>>>>>>> > >>>>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build > >>>>>>>>> Building source distribution... > >>>>>>>>> Building wheel from source distribution... > >>>>>>>>> Successfully built > >> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz > >>>>>>>>> Successfully built > >>>>>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl > >>>>>>>>> > >>>>>>>>> That's it. That also allows cases like installing provider > packages > >>>>>> using > >>>>>>>>> git URLs - which I used earlier today to test if the incoming PR > of > >>>>>>>>> pygments is actually solving the problem we had yesteday > >>>>>>>>> https://github.com/apache/airflow/pull/45416 (basically we just > >>>>>> make our > >>>>>>>>> provider packages "standard" python packages that all the tools > >>>>>> support. > >>>>>>>>> Anyone who would like to install a commit, hash or branch version > >> of > >>>>>> the > >>>>>>>>> "airbyte" package from main version of Airflow repo will be able > to > >>>>>> do: > >>>>>>>>> > >>>>>>>>> pip install "apache-airflow-providers-airbyte @ git+ > >>>>>>>>> > https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID" > >>>>>>>>> > >>>>>>>>> Currently in order to create the package we need to manually > >> extract > >>>>>> the > >>>>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some > files > >>>>>>>>> (pyproject.toml, README.rst and few others) and only then we > build > >>>>>> the > >>>>>>>>> package. All this - copying file structure, creating new files, > >>>>>> running the > >>>>>>>>> build command after and finally deleting the copied files is now > - > >>>>>>>>> dynamically and under-the-hood created by "breeze > >> release-management > >>>>>>>>> prepare-provider-packages" command. With this change, the > directory > >>>>>>>>> structure in `git` repo of ours is totally standard and allows us > >>>> (and > >>>>>>>>> anyone else) to build the package directly from it. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> And what is the plan for system tests? As part of this > >>>>>> reorganization, > >>>>>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system? > >> That > >>>>>> seems > >>>>>>>>>> more intuitive to me than their current location in > >>>>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> Oh yeah - I missed that in the original structure as the > "airbyte" > >>>>>>>>> provider (that I chose as first one) did not contain the "system" > >>>>>> tests - > >>>>>>>>> but one of the two providers after that i was planning to make > sure > >>>>>> system > >>>>>>>>> tests are covered. They are supposed to be moved to > "tests/system" > >> of > >>>>>>>>> course - Elad had similar question and I also explained it in > >> detail > >>>>>> in > >>>>>>>>> > >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I hope it answers the questions. If not - I am happy to add more > >>>>>>>>> clarifications :) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> J. > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>>>> > >>>>>> > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>>> > >>>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> For additional commands, e-mail: dev-h...@airflow.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >