Not strong at all, preference Is all. It sounds like Vincent and I are in the hyphen camp and you and Maciej are in the slash camp.
+1 on the “I don’t care what code style is used as long as it is programmatically enforced”. -a > On 10 Jan 2025, at 09:41, Jarek Potiuk <ja...@potiuk.com> wrote: > > Is there anything else that "tastes" Ash ? A concrete reason that makes you > think the "-" prefix in this case is better than the "/" folder? How > strong is your "taste" preference and do you think it will have some > lasting effect if we choose to flatten the folder structure? > > I might make a small vote to see what is the preference of people if we > think this is an important aspect. > > BTW. This is why I really love black/ruff formatting - we stopped wasting > time on "taste" discussion - it does not matter what is the individual > preference, consistency is more important and I prefer to do stuff that > really matters but if people feel strongly that we should discuss it, I > might make a vote there. > > J. > > On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor <a...@apache.org> wrote: > >> My preference is for being “more direct” and not having deeply nested >> things where possible — I think Microsoft might be the one case where >> having extra folders makes sense. And I’m fine with things not being >> consistent across providers/groups of providers. >> >> -ash >> >> >> >>> On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> wrote: >>> >>> Can you give an example of what might break why having >>> `providers/aapche-beam/src/airflow/providers/apache/beam`? >>> >>> Nothing will break. It's just: >>> >>> * the code will have to be a little more complex as it will have to do >> some >>> conditional writes of "-" "/" >>> * there will be inconsistency in the depth of folders - outside it will >> be >>> 1, inside it will be 2 (as it is in your example)/ >>> * it will be a bit more convention/ complex to limit related providers >> (say >>> microsoft) - with the current scheme "providers/microsoft" is the >> directory >>> containing all microsoft providers. If we change it to "-", you have to >>> find all sub-directories following "microsoft-*" convention. >>> >>> I am not super-strong on it - we could do either, it's just my preference >>> to use folders for grouping related things (as folders were designed >> for). >>> >>> J. >>> >>> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org> wrote: >>> >>>>> And we already have a number of mappings and conventions to handle >> that. >>>>> For example provider I'd mapping to dirs (apache.beam -> apache/beam), >>>> and >>>>> 'apache-airflow-providers-apache-beam' as package na e and >>>>> airflow/providers/apache/beam as packages inside the distribution. >> Those >>>>> will remain as they are - we cannot change them without breaking >>>>> compatibility. >>>> >>>> Can you give an example of what might break why having >>>> `providers/aapche-beam/src/airflow/providers/apache/beam`? >>>> >>>> -a >>>> >>>>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote: >>>>> >>>>> I think it will be better to keep it. >>>>> >>>>> The reason we have varying levels were to group things together - >> mainly >>>>> Apache related providers, but also Microsoft. >>>>> >>>>> And we already have a number of mappings and conventions to handle >> that. >>>>> For example provider I'd mapping to dirs (apache.beam -> apache/beam), >>>> and >>>>> 'apache-airflow-providers-apache-beam' as package na e and >>>>> airflow/providers/apache/beam as packages inside the distribution. >> Those >>>>> will remain as they are - we cannot change them without breaking >>>>> compatibility. >>>>> >>>>> So if we change it to a flat structure we will have some >> inconsistencies >>>> - >>>>> in some cases it will be single folder in others (packages) those will >> be >>>>> two folders. >>>>> >>>>> I think it will be more harm than good if we get rid of the 'folder' >>>>> structures - some of the code in breeze will have to treat those >>>>> differently as well. Nothing extraordinary and very complex but more >>>>> complex-ish than it should be - already on top of handling potentially >>>>> nested folders >>>>> >>>>> So my preference would be to stay with apache/beam - it's just more >>>>> consistently handling the case where provider packages can be one-level >>>>> nested >>>>> >>>>> J >>>>> >>>>> >>>>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org> >>>>> napisał: >>>>> >>>>>> Good question. I always found it confusing to have some providers at >>>>>> different level. Examples: >>>>>> - "airbyte" in "providers" directory (I would qualify it as "regular" >>>>>> provider) >>>>>> - "hive" in "providers/apache" >>>>>> - "amazon" in "providers" but which contains only one sub directory >>>> "aws" >>>>>> >>>>>> I would be in favor of using "-" instead of "/" so that all providers >>>> are >>>>>> at the same level. >>>>>> >>>>>> >>>>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote: >>>>>>> +1 one to this on general terms, it will hopefully reduce a lot of >> the >>>>>> boilerplate we need. >>>>>>> >>>>>>> As for the amazon/aws example specifically that does bring up a >>>>>> question, should we have `/` or `-`.. to give some examples: >>>>>>> >>>>>>> cncf kubernetes: ./providers/cncf/kubernetes or >>>>>> ./providers/cncf-kubernetes >>>>>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive >>>>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws >>>>>>> >>>>>>> There is no requirement from python etc on one form or the other (as >>>>>> it’s just a folder, not part of the module name), so it’s what ever >>>> makes >>>>>> most sense to us. >>>>>>> >>>>>>> Jarek and Dennis (and others): what are your preferences on these >>>> styles? >>>>>>> >>>>>>> -ash >>>>>>> >>>>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote: >>>>>>>> >>>>>>>> Oh. . And one other benefit of it will be that we will be able to >> get >>>>>> rid >>>>>>>> of about 40% of the "Providers Manager" code. Currently, in >> Providers >>>>>>>> manager we have a lot of "ifs" that make it possible to use >> providers >>>>>> in >>>>>>>> breeze and local environment from the sources. In "production" >>>>>> installation >>>>>>>> we are using "get_provider_info" entry points to discover providers >>>>>> and >>>>>>>> discover if provider is installed, but when you use current >> providers >>>>>>>> installed in Breeze to inside "airflow", we rely on `provider.yaml` >> to >>>>>> be >>>>>>>> present in the "airflow.providers.PROVIDER_ID" path - so we >>>> effectively >>>>>>>> have two paths of discovering information about the providers >>>>>> installed. >>>>>>>> >>>>>>>> After all providers are migrated to the new structure, all providers >>>>>> are >>>>>>>> separate "distributions" - and when you run `uv sync` (which will >>>>>> install >>>>>>>> all providers thanks to workspace feature) or `pip install -e >>>>>>>> ./providers/aws` (which you will have to do manually to work on the >>>>>>>> provider - if you use `pip` rather than uv) - then we will not have >> to >>>>>> use >>>>>>>> the separate path to read provider.yaml, because the right >> entrypoint >>>>>> for >>>>>>>> the provider will be installed as well - so we will be able to get >> rid >>>>>> of >>>>>>>> quite some code that is currently only used in airflow development >>>>>>>> environment. >>>>>>>> >>>>>>>> J. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> >>>> wrote: >>>>>>>> >>>>>>>>> Those are very good questions :) >>>>>>>>> >>>>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis >>>>>>>>> <ferru...@amazon.com.invalid> wrote: >>>>>>>>> >>>>>>>>>> To clarify that I understand your diagram correctly, let's say you >>>>>> clone >>>>>>>>>> the Airflow repo to ~/workspace/airflow/. Does this mean that the >>>>>> AWS Glue >>>>>>>>>> Hook which used to live at >>>>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a >> random >>>>>>>>>> example) will be located at >>>>>>>>>> >>>>>> >>>> >> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? >>>>>>>>>> That feels unnecessarily repetitive to me, maybe it makes sense >> but >>>>>> I'm >>>>>>>>>> missing the context? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yes - it means that there is this repetitiveness but for a good >>>>>> reason - >>>>>>>>> those two "amazon/aws" serve different purpose: >>>>>>>>> >>>>>>>>> * The first "providers/amazon/aws" is just where the whole provider >>>>>>>>> "complete project" is stored - it's basically a directory where >> "aws >>>>>>>>> provider" is stored, a convenient folder to locate it in, that >> makes >>>>>> it >>>>>>>>> separate from other providers >>>>>>>>> * The second "src/airflow/providers/amazon/aws" - is the python >>>>>>>>> package where the source files is stored - this is how (inside the >>>>>>>>> sub-folder) you tell the actual python "import" to look for the >>>>>> sources. >>>>>>>>> >>>>>>>>> .What really matters is that (eventually) >>>>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a >>>>>> completely >>>>>>>>> separate python project - a source of a "standalone" provider >> python >>>>>>>>> project. >>>>>>>>> There is a "pyproject.toml" file at the root of it and if you do >> this >>>>>> (for >>>>>>>>> example): >>>>>>>>> >>>>>>>>> cd providers/amazon/aws/ >>>>>>>>> uv sync >>>>>>>>> >>>>>>>>> And with it you will be able to work on AWS provider exclusively >> as a >>>>>>>>> separate project (this is not yet complete with the move - tests >> are >>>>>> not >>>>>>>>> entirely possible to run today - but it will be possible as next >> step >>>>>> - I >>>>>>>>> explained it in >>>>>>>>> >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 >>>>>>>>> >>>>>>>>> This has a number of benefits, but one of them is that you will be >>>>>> able to >>>>>>>>> build provider by just running `build` command of your favourite >>>>>>>>> PEP-standard compliant frontend: >>>>>>>>> >>>>>>>>> cd providers/amazon/aws/ >>>>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build` >> ).... >>>>>>>>> >>>>>>>>> This will create the provider package inside the `dist" folder. I >>>>>> just >>>>>>>>> did it in my PR with `uv` in the first "airbyte` project: >>>>>>>>> >>>>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build >>>>>>>>> Building source distribution... >>>>>>>>> Building wheel from source distribution... >>>>>>>>> Successfully built >> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz >>>>>>>>> Successfully built >>>>>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl >>>>>>>>> >>>>>>>>> That's it. That also allows cases like installing provider packages >>>>>> using >>>>>>>>> git URLs - which I used earlier today to test if the incoming PR of >>>>>>>>> pygments is actually solving the problem we had yesteday >>>>>>>>> https://github.com/apache/airflow/pull/45416 (basically we just >>>>>> make our >>>>>>>>> provider packages "standard" python packages that all the tools >>>>>> support. >>>>>>>>> Anyone who would like to install a commit, hash or branch version >> of >>>>>> the >>>>>>>>> "airbyte" package from main version of Airflow repo will be able to >>>>>> do: >>>>>>>>> >>>>>>>>> pip install "apache-airflow-providers-airbyte @ git+ >>>>>>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID" >>>>>>>>> >>>>>>>>> Currently in order to create the package we need to manually >> extract >>>>>> the >>>>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files >>>>>>>>> (pyproject.toml, README.rst and few others) and only then we build >>>>>> the >>>>>>>>> package. All this - copying file structure, creating new files, >>>>>> running the >>>>>>>>> build command after and finally deleting the copied files is now - >>>>>>>>> dynamically and under-the-hood created by "breeze >> release-management >>>>>>>>> prepare-provider-packages" command. With this change, the directory >>>>>>>>> structure in `git` repo of ours is totally standard and allows us >>>> (and >>>>>>>>> anyone else) to build the package directly from it. >>>>>>>>> >>>>>>>>> >>>>>>>>> And what is the plan for system tests? As part of this >>>>>> reorganization, >>>>>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system? >> That >>>>>> seems >>>>>>>>>> more intuitive to me than their current location in >>>>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Oh yeah - I missed that in the original structure as the "airbyte" >>>>>>>>> provider (that I chose as first one) did not contain the "system" >>>>>> tests - >>>>>>>>> but one of the two providers after that i was planning to make sure >>>>>> system >>>>>>>>> tests are covered. They are supposed to be moved to "tests/system" >> of >>>>>>>>> course - Elad had similar question and I also explained it in >> detail >>>>>> in >>>>>>>>> >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 >>>>>>>>> >>>>>>>>> >>>>>>>>> I hope it answers the questions. If not - I am happy to add more >>>>>>>>> clarifications :) >>>>>>>>> >>>>>>>>> >>>>>>>>>> J. >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>> >>>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>> >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org