Is there anything else that "tastes" Ash ? A concrete reason that makes you
think the "-" prefix in this case is better than the "/" folder?  How
strong is your "taste" preference and do you think it will have some
lasting effect if we choose to flatten the folder structure?

I might make a small vote to see what is the preference of people if we
think this is an important aspect.

BTW. This is why I really love black/ruff formatting - we stopped wasting
time on "taste" discussion - it does not matter what is the individual
preference, consistency is more important and I prefer to do stuff that
really matters but if people feel strongly that we should discuss it, I
might make a vote there.

J.

On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> My preference is for being “more direct” and not having deeply nested
> things where possible — I think Microsoft might be the one case where
> having extra folders makes sense. And I’m fine with things not being
> consistent across providers/groups of providers.
>
> -ash
>
>
>
> > On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > Can you give an example of what might break why having
> > `providers/aapche-beam/src/airflow/providers/apache/beam`?
> >
> > Nothing will break. It's just:
> >
> > * the code will have to be a little more complex as it will have to do
> some
> > conditional writes of "-" "/"
> > * there will be inconsistency in the depth of folders - outside it will
> be
> > 1, inside it will be 2 (as it is in your example)/
> > * it will be a bit more convention/ complex to limit related providers
> (say
> > microsoft) - with the current scheme "providers/microsoft" is the
> directory
> > containing all microsoft providers. If we change it to "-", you have to
> > find all sub-directories following "microsoft-*" convention.
> >
> > I am not super-strong on it - we could do either, it's just my preference
> > to use folders for grouping related things (as folders were designed
> for).
> >
> > J.
> >
> > On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> >
> >>> And we already have a number of mappings and conventions to handle
> that.
> >>> For example provider I'd mapping to dirs (apache.beam -> apache/beam),
> >> and
> >>> 'apache-airflow-providers-apache-beam' as package na e  and
> >>> airflow/providers/apache/beam as packages inside the distribution.
> Those
> >>> will remain as they are - we cannot change them without breaking
> >>> compatibility.
> >>
> >> Can you give an example of what might break why having
> >> `providers/aapche-beam/src/airflow/providers/apache/beam`?
> >>
> >> -a
> >>
> >>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>
> >>> I think it will be better to keep it.
> >>>
> >>> The reason we have varying levels were to group things together -
> mainly
> >>> Apache related providers, but also Microsoft.
> >>>
> >>> And we already have a number of mappings and conventions to handle
> that.
> >>> For example provider I'd mapping to dirs (apache.beam -> apache/beam),
> >> and
> >>> 'apache-airflow-providers-apache-beam' as package na e  and
> >>> airflow/providers/apache/beam as packages inside the distribution.
> Those
> >>> will remain as they are - we cannot change them without breaking
> >>> compatibility.
> >>>
> >>> So if we change it to a flat structure we will have some
> inconsistencies
> >> -
> >>> in some cases it will be single folder in others (packages) those will
> be
> >>> two folders.
> >>>
> >>> I think it will be more harm than good if we get rid of the 'folder'
> >>> structures - some of the code in breeze will have to treat those
> >>> differently as well. Nothing extraordinary and very complex but more
> >>> complex-ish than it should be - already on top of handling potentially
> >>> nested folders
> >>>
> >>> So my preference would be to stay with apache/beam - it's just more
> >>> consistently handling the case where provider packages can be one-level
> >>> nested
> >>>
> >>> J
> >>>
> >>>
> >>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org>
> >>> napisał:
> >>>
> >>>> Good question. I always found it confusing to have some providers at
> >>>> different level. Examples:
> >>>> - "airbyte" in "providers" directory (I would qualify it as "regular"
> >>>> provider)
> >>>> - "hive" in "providers/apache"
> >>>> - "amazon" in "providers" but which contains only one sub directory
> >> "aws"
> >>>>
> >>>> I would be in favor of using "-" instead of "/" so that all providers
> >> are
> >>>> at the same level.
> >>>>
> >>>>
> >>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote:
> >>>>> +1 one to this on general terms, it will hopefully reduce a lot of
> the
> >>>> boilerplate we need.
> >>>>>
> >>>>> As for the amazon/aws example specifically that does bring up a
> >>>> question, should we have `/` or `-`.. to give some examples:
> >>>>>
> >>>>> cncf kubernetes: ./providers/cncf/kubernetes or
> >>>> ./providers/cncf-kubernetes
> >>>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive
> >>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws
> >>>>>
> >>>>> There is no requirement from python etc on one form or the other (as
> >>>> it’s just a folder, not part of the module name), so it’s what ever
> >> makes
> >>>> most sense to us.
> >>>>>
> >>>>> Jarek and Dennis (and others): what are your preferences on these
> >> styles?
> >>>>>
> >>>>> -ash
> >>>>>
> >>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>>>>
> >>>>>> Oh. . And one other benefit of it will be that we will be able to
> get
> >>>> rid
> >>>>>> of about 40% of the "Providers Manager" code. Currently, in
> Providers
> >>>>>> manager we have a lot of "ifs" that make it possible to use
> providers
> >>>> in
> >>>>>> breeze and local environment from the sources. In "production"
> >>>> installation
> >>>>>> we are using "get_provider_info"  entry points to discover providers
> >>>> and
> >>>>>> discover if provider is installed, but when you use current
> providers
> >>>>>> installed in Breeze to inside "airflow", we rely on `provider.yaml`
> to
> >>>> be
> >>>>>> present in the "airflow.providers.PROVIDER_ID" path - so we
> >> effectively
> >>>>>> have two paths of discovering information about the providers
> >>>> installed.
> >>>>>>
> >>>>>> After all providers are migrated to the new structure, all providers
> >>>> are
> >>>>>> separate "distributions" - and when you run `uv sync`  (which will
> >>>> install
> >>>>>> all providers thanks to workspace feature) or `pip install -e
> >>>>>> ./providers/aws` (which you will have to do manually to work on the
> >>>>>> provider - if you use `pip` rather than uv) - then we will not have
> to
> >>>> use
> >>>>>> the separate path to read provider.yaml, because the right
> entrypoint
> >>>> for
> >>>>>> the provider will be installed as well - so we will be able to get
> rid
> >>>> of
> >>>>>> quite some code that is currently only used in airflow development
> >>>>>> environment.
> >>>>>>
> >>>>>> J.
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com>
> >> wrote:
> >>>>>>
> >>>>>>> Those are very good questions :)
> >>>>>>>
> >>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
> >>>>>>> <ferru...@amazon.com.invalid> wrote:
> >>>>>>>
> >>>>>>>> To clarify that I understand your diagram correctly, let's say you
> >>>> clone
> >>>>>>>> the Airflow repo to ~/workspace/airflow/.  Does this mean that the
> >>>> AWS Glue
> >>>>>>>> Hook which used to live at
> >>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a
> random
> >>>>>>>> example) will be located at
> >>>>>>>>
> >>>>
> >>
> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
> >>>>>>>> That feels unnecessarily repetitive to me, maybe it makes sense
> but
> >>>> I'm
> >>>>>>>> missing the context?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes - it means that there is this repetitiveness but for a good
> >>>> reason -
> >>>>>>> those two "amazon/aws" serve different purpose:
> >>>>>>>
> >>>>>>> * The first "providers/amazon/aws" is just where the whole provider
> >>>>>>> "complete project" is stored - it's basically a directory where
> "aws
> >>>>>>> provider" is stored, a convenient folder to locate it in, that
> makes
> >>>> it
> >>>>>>> separate from other providers
> >>>>>>> * The second "src/airflow/providers/amazon/aws" - is the python
> >>>>>>> package where the source files is stored - this is how (inside the
> >>>>>>> sub-folder) you tell the actual python "import" to look for the
> >>>> sources.
> >>>>>>>
> >>>>>>> .What really matters is that (eventually)
> >>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a
> >>>> completely
> >>>>>>> separate python project - a source of a "standalone" provider
> python
> >>>>>>> project.
> >>>>>>> There is a "pyproject.toml" file at the root of it and if you do
> this
> >>>> (for
> >>>>>>> example):
> >>>>>>>
> >>>>>>> cd providers/amazon/aws/
> >>>>>>> uv sync
> >>>>>>>
> >>>>>>> And with it you will be able to work on AWS provider exclusively
> as a
> >>>>>>> separate project (this is not yet complete with the move - tests
> are
> >>>> not
> >>>>>>> entirely possible to run today - but it will be possible as next
> step
> >>>> - I
> >>>>>>> explained it in
> >>>>>>>
> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> >>>>>>>
> >>>>>>> This has a number of benefits, but one of them is that you will be
> >>>> able to
> >>>>>>> build provider by just running `build` command of your favourite
> >>>>>>> PEP-standard compliant frontend:
> >>>>>>>
> >>>>>>> cd providers/amazon/aws/
> >>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build`
> )....
> >>>>>>>
> >>>>>>> This will create  the provider package inside the `dist" folder. I
> >>>> just
> >>>>>>> did it in my PR with `uv` in the first "airbyte` project:
> >>>>>>>
> >>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build
> >>>>>>> Building source distribution...
> >>>>>>> Building wheel from source distribution...
> >>>>>>> Successfully built
> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
> >>>>>>> Successfully built
> >>>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
> >>>>>>>
> >>>>>>> That's it. That also allows cases like installing provider packages
> >>>> using
> >>>>>>> git URLs - which I used earlier today to test if the incoming PR of
> >>>>>>> pygments is actually solving the problem we had yesteday
> >>>>>>> https://github.com/apache/airflow/pull/45416  (basically we just
> >>>> make our
> >>>>>>> provider packages "standard" python packages that all the tools
> >>>> support.
> >>>>>>> Anyone who would like to install a commit, hash or branch version
> of
> >>>> the
> >>>>>>> "airbyte" package from main version of Airflow repo will be able to
> >>>> do:
> >>>>>>>
> >>>>>>> pip install "apache-airflow-providers-airbyte @ git+
> >>>>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
> >>>>>>>
> >>>>>>> Currently in order to create the package we need to manually
> extract
> >>>> the
> >>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files
> >>>>>>> (pyproject.toml, README.rst and few others) and only then we  build
> >>>> the
> >>>>>>> package. All this - copying file structure, creating new files,
> >>>> running the
> >>>>>>> build command after and finally deleting the copied files is now -
> >>>>>>> dynamically and under-the-hood created by "breeze
> release-management
> >>>>>>> prepare-provider-packages" command. With this change, the directory
> >>>>>>> structure in `git` repo of ours is totally standard and allows us
> >> (and
> >>>>>>> anyone else) to build the package directly from it.
> >>>>>>>
> >>>>>>>
> >>>>>>> And what is the plan for system tests?   As part of this
> >>>> reorganization,
> >>>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system?
> That
> >>>> seems
> >>>>>>>> more intuitive to me than their current location in
> >>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
> >>>>>>>>
> >>>>>>>>
> >>>>>>> Oh yeah - I missed that in the original structure as the "airbyte"
> >>>>>>> provider (that I chose as first one) did not contain the "system"
> >>>> tests -
> >>>>>>> but one of the two providers after that i was planning to make sure
> >>>> system
> >>>>>>> tests are covered. They are supposed to be moved to "tests/system"
> of
> >>>>>>> course - Elad had similar question and I also explained it in
> detail
> >>>> in
> >>>>>>>
> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> >>>>>>>
> >>>>>>>
> >>>>>>> I hope it answers the questions. If not - I am happy to add more
> >>>>>>> clarifications :)
> >>>>>>>
> >>>>>>>
> >>>>>>>> J.
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to