I agree this makes sense.

I was originally concerned that this would make it more difficult to ensure
compatibility across providers for capabilities such as common.sql,
objectstore, and so on.
However, seeing that the "common" pattern would remain the same and it's
only the code layout that is changing, and that we are getting rid of a ton
of generated code, I am positive on this.


On Fri, Jan 10, 2025 at 3:07 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> So I propose letting the "doer" make the decision if we are split.
>
> On Fri, Jan 10, 2025 at 11:52 AM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > Not strong at all, preference Is all. It sounds like Vincent and I are in
> > the hyphen camp and you and Maciej are in the slash camp.
> >
> > +1 on the “I don’t care what code style is used as long as it is
> > programmatically enforced”.
> >
> > -a
> >
> > > On 10 Jan 2025, at 09:41, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >
> > > Is there anything else that "tastes" Ash ? A concrete reason that makes
> > you
> > > think the "-" prefix in this case is better than the "/" folder?  How
> > > strong is your "taste" preference and do you think it will have some
> > > lasting effect if we choose to flatten the folder structure?
> > >
> > > I might make a small vote to see what is the preference of people if we
> > > think this is an important aspect.
> > >
> > > BTW. This is why I really love black/ruff formatting - we stopped
> wasting
> > > time on "taste" discussion - it does not matter what is the individual
> > > preference, consistency is more important and I prefer to do stuff that
> > > really matters but if people feel strongly that we should discuss it, I
> > > might make a vote there.
> > >
> > > J.
> > >
> > > On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> > >
> > >> My preference is for being “more direct” and not having deeply nested
> > >> things where possible — I think Microsoft might be the one case where
> > >> having extra folders makes sense. And I’m fine with things not being
> > >> consistent across providers/groups of providers.
> > >>
> > >> -ash
> > >>
> > >>
> > >>
> > >>> On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >>>
> > >>> Can you give an example of what might break why having
> > >>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
> > >>>
> > >>> Nothing will break. It's just:
> > >>>
> > >>> * the code will have to be a little more complex as it will have to
> do
> > >> some
> > >>> conditional writes of "-" "/"
> > >>> * there will be inconsistency in the depth of folders - outside it
> will
> > >> be
> > >>> 1, inside it will be 2 (as it is in your example)/
> > >>> * it will be a bit more convention/ complex to limit related
> providers
> > >> (say
> > >>> microsoft) - with the current scheme "providers/microsoft" is the
> > >> directory
> > >>> containing all microsoft providers. If we change it to "-", you have
> to
> > >>> find all sub-directories following "microsoft-*" convention.
> > >>>
> > >>> I am not super-strong on it - we could do either, it's just my
> > preference
> > >>> to use folders for grouping related things (as folders were designed
> > >> for).
> > >>>
> > >>> J.
> > >>>
> > >>> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org>
> > wrote:
> > >>>
> > >>>>> And we already have a number of mappings and conventions to handle
> > >> that.
> > >>>>> For example provider I'd mapping to dirs (apache.beam ->
> > apache/beam),
> > >>>> and
> > >>>>> 'apache-airflow-providers-apache-beam' as package na e  and
> > >>>>> airflow/providers/apache/beam as packages inside the distribution.
> > >> Those
> > >>>>> will remain as they are - we cannot change them without breaking
> > >>>>> compatibility.
> > >>>>
> > >>>> Can you give an example of what might break why having
> > >>>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
> > >>>>
> > >>>> -a
> > >>>>
> > >>>>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >>>>>
> > >>>>> I think it will be better to keep it.
> > >>>>>
> > >>>>> The reason we have varying levels were to group things together -
> > >> mainly
> > >>>>> Apache related providers, but also Microsoft.
> > >>>>>
> > >>>>> And we already have a number of mappings and conventions to handle
> > >> that.
> > >>>>> For example provider I'd mapping to dirs (apache.beam ->
> > apache/beam),
> > >>>> and
> > >>>>> 'apache-airflow-providers-apache-beam' as package na e  and
> > >>>>> airflow/providers/apache/beam as packages inside the distribution.
> > >> Those
> > >>>>> will remain as they are - we cannot change them without breaking
> > >>>>> compatibility.
> > >>>>>
> > >>>>> So if we change it to a flat structure we will have some
> > >> inconsistencies
> > >>>> -
> > >>>>> in some cases it will be single folder in others (packages) those
> > will
> > >> be
> > >>>>> two folders.
> > >>>>>
> > >>>>> I think it will be more harm than good if we get rid of the
> 'folder'
> > >>>>> structures - some of the code in breeze will have to treat those
> > >>>>> differently as well. Nothing extraordinary and very complex but
> more
> > >>>>> complex-ish than it should be - already on top of handling
> > potentially
> > >>>>> nested folders
> > >>>>>
> > >>>>> So my preference would be to stay with apache/beam - it's just more
> > >>>>> consistently handling the case where provider packages can be
> > one-level
> > >>>>> nested
> > >>>>>
> > >>>>> J
> > >>>>>
> > >>>>>
> > >>>>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <
> vincb...@apache.org>
> > >>>>> napisał:
> > >>>>>
> > >>>>>> Good question. I always found it confusing to have some providers
> at
> > >>>>>> different level. Examples:
> > >>>>>> - "airbyte" in "providers" directory (I would qualify it as
> > "regular"
> > >>>>>> provider)
> > >>>>>> - "hive" in "providers/apache"
> > >>>>>> - "amazon" in "providers" but which contains only one sub
> directory
> > >>>> "aws"
> > >>>>>>
> > >>>>>> I would be in favor of using "-" instead of "/" so that all
> > providers
> > >>>> are
> > >>>>>> at the same level.
> > >>>>>>
> > >>>>>>
> > >>>>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote:
> > >>>>>>> +1 one to this on general terms, it will hopefully reduce a lot
> of
> > >> the
> > >>>>>> boilerplate we need.
> > >>>>>>>
> > >>>>>>> As for the amazon/aws example specifically that does bring up a
> > >>>>>> question, should we have `/` or `-`.. to give some examples:
> > >>>>>>>
> > >>>>>>> cncf kubernetes: ./providers/cncf/kubernetes or
> > >>>>>> ./providers/cncf-kubernetes
> > >>>>>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive
> > >>>>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws
> > >>>>>>>
> > >>>>>>> There is no requirement from python etc on one form or the other
> > (as
> > >>>>>> it’s just a folder, not part of the module name), so it’s what
> ever
> > >>>> makes
> > >>>>>> most sense to us.
> > >>>>>>>
> > >>>>>>> Jarek and Dennis (and others): what are your preferences on these
> > >>>> styles?
> > >>>>>>>
> > >>>>>>> -ash
> > >>>>>>>
> > >>>>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:
> > >>>>>>>>
> > >>>>>>>> Oh. . And one other benefit of it will be that we will be able
> to
> > >> get
> > >>>>>> rid
> > >>>>>>>> of about 40% of the "Providers Manager" code. Currently, in
> > >> Providers
> > >>>>>>>> manager we have a lot of "ifs" that make it possible to use
> > >> providers
> > >>>>>> in
> > >>>>>>>> breeze and local environment from the sources. In "production"
> > >>>>>> installation
> > >>>>>>>> we are using "get_provider_info"  entry points to discover
> > providers
> > >>>>>> and
> > >>>>>>>> discover if provider is installed, but when you use current
> > >> providers
> > >>>>>>>> installed in Breeze to inside "airflow", we rely on
> > `provider.yaml`
> > >> to
> > >>>>>> be
> > >>>>>>>> present in the "airflow.providers.PROVIDER_ID" path - so we
> > >>>> effectively
> > >>>>>>>> have two paths of discovering information about the providers
> > >>>>>> installed.
> > >>>>>>>>
> > >>>>>>>> After all providers are migrated to the new structure, all
> > providers
> > >>>>>> are
> > >>>>>>>> separate "distributions" - and when you run `uv sync`  (which
> will
> > >>>>>> install
> > >>>>>>>> all providers thanks to workspace feature) or `pip install -e
> > >>>>>>>> ./providers/aws` (which you will have to do manually to work on
> > the
> > >>>>>>>> provider - if you use `pip` rather than uv) - then we will not
> > have
> > >> to
> > >>>>>> use
> > >>>>>>>> the separate path to read provider.yaml, because the right
> > >> entrypoint
> > >>>>>> for
> > >>>>>>>> the provider will be installed as well - so we will be able to
> get
> > >> rid
> > >>>>>> of
> > >>>>>>>> quite some code that is currently only used in airflow
> development
> > >>>>>>>> environment.
> > >>>>>>>>
> > >>>>>>>> J.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com>
> > >>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Those are very good questions :)
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
> > >>>>>>>>> <ferru...@amazon.com.invalid> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> To clarify that I understand your diagram correctly, let's say
> > you
> > >>>>>> clone
> > >>>>>>>>>> the Airflow repo to ~/workspace/airflow/.  Does this mean that
> > the
> > >>>>>> AWS Glue
> > >>>>>>>>>> Hook which used to live at
> > >>>>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a
> > >> random
> > >>>>>>>>>> example) will be located at
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
> > >>>>>>>>>> That feels unnecessarily repetitive to me, maybe it makes
> sense
> > >> but
> > >>>>>> I'm
> > >>>>>>>>>> missing the context?
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Yes - it means that there is this repetitiveness but for a good
> > >>>>>> reason -
> > >>>>>>>>> those two "amazon/aws" serve different purpose:
> > >>>>>>>>>
> > >>>>>>>>> * The first "providers/amazon/aws" is just where the whole
> > provider
> > >>>>>>>>> "complete project" is stored - it's basically a directory where
> > >> "aws
> > >>>>>>>>> provider" is stored, a convenient folder to locate it in, that
> > >> makes
> > >>>>>> it
> > >>>>>>>>> separate from other providers
> > >>>>>>>>> * The second "src/airflow/providers/amazon/aws" - is the python
> > >>>>>>>>> package where the source files is stored - this is how (inside
> > the
> > >>>>>>>>> sub-folder) you tell the actual python "import" to look for the
> > >>>>>> sources.
> > >>>>>>>>>
> > >>>>>>>>> .What really matters is that (eventually)
> > >>>>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a
> > >>>>>> completely
> > >>>>>>>>> separate python project - a source of a "standalone" provider
> > >> python
> > >>>>>>>>> project.
> > >>>>>>>>> There is a "pyproject.toml" file at the root of it and if you
> do
> > >> this
> > >>>>>> (for
> > >>>>>>>>> example):
> > >>>>>>>>>
> > >>>>>>>>> cd providers/amazon/aws/
> > >>>>>>>>> uv sync
> > >>>>>>>>>
> > >>>>>>>>> And with it you will be able to work on AWS provider
> exclusively
> > >> as a
> > >>>>>>>>> separate project (this is not yet complete with the move -
> tests
> > >> are
> > >>>>>> not
> > >>>>>>>>> entirely possible to run today - but it will be possible as
> next
> > >> step
> > >>>>>> - I
> > >>>>>>>>> explained it in
> > >>>>>>>>>
> > >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> > >>>>>>>>>
> > >>>>>>>>> This has a number of benefits, but one of them is that you will
> > be
> > >>>>>> able to
> > >>>>>>>>> build provider by just running `build` command of your
> favourite
> > >>>>>>>>> PEP-standard compliant frontend:
> > >>>>>>>>>
> > >>>>>>>>> cd providers/amazon/aws/
> > >>>>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build`
> > >> )....
> > >>>>>>>>>
> > >>>>>>>>> This will create  the provider package inside the `dist"
> folder.
> > I
> > >>>>>> just
> > >>>>>>>>> did it in my PR with `uv` in the first "airbyte` project:
> > >>>>>>>>>
> > >>>>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build
> > >>>>>>>>> Building source distribution...
> > >>>>>>>>> Building wheel from source distribution...
> > >>>>>>>>> Successfully built
> > >> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
> > >>>>>>>>> Successfully built
> > >>>>>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
> > >>>>>>>>>
> > >>>>>>>>> That's it. That also allows cases like installing provider
> > packages
> > >>>>>> using
> > >>>>>>>>> git URLs - which I used earlier today to test if the incoming
> PR
> > of
> > >>>>>>>>> pygments is actually solving the problem we had yesteday
> > >>>>>>>>> https://github.com/apache/airflow/pull/45416  (basically we
> just
> > >>>>>> make our
> > >>>>>>>>> provider packages "standard" python packages that all the tools
> > >>>>>> support.
> > >>>>>>>>> Anyone who would like to install a commit, hash or branch
> version
> > >> of
> > >>>>>> the
> > >>>>>>>>> "airbyte" package from main version of Airflow repo will be
> able
> > to
> > >>>>>> do:
> > >>>>>>>>>
> > >>>>>>>>> pip install "apache-airflow-providers-airbyte @ git+
> > >>>>>>>>>
> > https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
> > >>>>>>>>>
> > >>>>>>>>> Currently in order to create the package we need to manually
> > >> extract
> > >>>>>> the
> > >>>>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some
> > files
> > >>>>>>>>> (pyproject.toml, README.rst and few others) and only then we
> > build
> > >>>>>> the
> > >>>>>>>>> package. All this - copying file structure, creating new files,
> > >>>>>> running the
> > >>>>>>>>> build command after and finally deleting the copied files is
> now
> > -
> > >>>>>>>>> dynamically and under-the-hood created by "breeze
> > >> release-management
> > >>>>>>>>> prepare-provider-packages" command. With this change, the
> > directory
> > >>>>>>>>> structure in `git` repo of ours is totally standard and allows
> us
> > >>>> (and
> > >>>>>>>>> anyone else) to build the package directly from it.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> And what is the plan for system tests?   As part of this
> > >>>>>> reorganization,
> > >>>>>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system?
> > >> That
> > >>>>>> seems
> > >>>>>>>>>> more intuitive to me than their current location in
> > >>>>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>> Oh yeah - I missed that in the original structure as the
> > "airbyte"
> > >>>>>>>>> provider (that I chose as first one) did not contain the
> "system"
> > >>>>>> tests -
> > >>>>>>>>> but one of the two providers after that i was planning to make
> > sure
> > >>>>>> system
> > >>>>>>>>> tests are covered. They are supposed to be moved to
> > "tests/system"
> > >> of
> > >>>>>>>>> course - Elad had similar question and I also explained it in
> > >> detail
> > >>>>>> in
> > >>>>>>>>>
> > >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I hope it answers the questions. If not - I am happy to add
> more
> > >>>>>>>>> clarifications :)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> J.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>>>
> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> > >>>>
> > >>>>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > >>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>

Reply via email to