So I propose letting the "doer" make the decision if we are split.

On Fri, Jan 10, 2025 at 11:52 AM Ash Berlin-Taylor <a...@apache.org> wrote:

> Not strong at all, preference Is all. It sounds like Vincent and I are in
> the hyphen camp and you and Maciej are in the slash camp.
>
> +1 on the “I don’t care what code style is used as long as it is
> programmatically enforced”.
>
> -a
>
> > On 10 Jan 2025, at 09:41, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > Is there anything else that "tastes" Ash ? A concrete reason that makes
> you
> > think the "-" prefix in this case is better than the "/" folder?  How
> > strong is your "taste" preference and do you think it will have some
> > lasting effect if we choose to flatten the folder structure?
> >
> > I might make a small vote to see what is the preference of people if we
> > think this is an important aspect.
> >
> > BTW. This is why I really love black/ruff formatting - we stopped wasting
> > time on "taste" discussion - it does not matter what is the individual
> > preference, consistency is more important and I prefer to do stuff that
> > really matters but if people feel strongly that we should discuss it, I
> > might make a vote there.
> >
> > J.
> >
> > On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> >
> >> My preference is for being “more direct” and not having deeply nested
> >> things where possible — I think Microsoft might be the one case where
> >> having extra folders makes sense. And I’m fine with things not being
> >> consistent across providers/groups of providers.
> >>
> >> -ash
> >>
> >>
> >>
> >>> On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>
> >>> Can you give an example of what might break why having
> >>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
> >>>
> >>> Nothing will break. It's just:
> >>>
> >>> * the code will have to be a little more complex as it will have to do
> >> some
> >>> conditional writes of "-" "/"
> >>> * there will be inconsistency in the depth of folders - outside it will
> >> be
> >>> 1, inside it will be 2 (as it is in your example)/
> >>> * it will be a bit more convention/ complex to limit related providers
> >> (say
> >>> microsoft) - with the current scheme "providers/microsoft" is the
> >> directory
> >>> containing all microsoft providers. If we change it to "-", you have to
> >>> find all sub-directories following "microsoft-*" convention.
> >>>
> >>> I am not super-strong on it - we could do either, it's just my
> preference
> >>> to use folders for grouping related things (as folders were designed
> >> for).
> >>>
> >>> J.
> >>>
> >>> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> >>>
> >>>>> And we already have a number of mappings and conventions to handle
> >> that.
> >>>>> For example provider I'd mapping to dirs (apache.beam ->
> apache/beam),
> >>>> and
> >>>>> 'apache-airflow-providers-apache-beam' as package na e  and
> >>>>> airflow/providers/apache/beam as packages inside the distribution.
> >> Those
> >>>>> will remain as they are - we cannot change them without breaking
> >>>>> compatibility.
> >>>>
> >>>> Can you give an example of what might break why having
> >>>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
> >>>>
> >>>> -a
> >>>>
> >>>>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>>>
> >>>>> I think it will be better to keep it.
> >>>>>
> >>>>> The reason we have varying levels were to group things together -
> >> mainly
> >>>>> Apache related providers, but also Microsoft.
> >>>>>
> >>>>> And we already have a number of mappings and conventions to handle
> >> that.
> >>>>> For example provider I'd mapping to dirs (apache.beam ->
> apache/beam),
> >>>> and
> >>>>> 'apache-airflow-providers-apache-beam' as package na e  and
> >>>>> airflow/providers/apache/beam as packages inside the distribution.
> >> Those
> >>>>> will remain as they are - we cannot change them without breaking
> >>>>> compatibility.
> >>>>>
> >>>>> So if we change it to a flat structure we will have some
> >> inconsistencies
> >>>> -
> >>>>> in some cases it will be single folder in others (packages) those
> will
> >> be
> >>>>> two folders.
> >>>>>
> >>>>> I think it will be more harm than good if we get rid of the 'folder'
> >>>>> structures - some of the code in breeze will have to treat those
> >>>>> differently as well. Nothing extraordinary and very complex but more
> >>>>> complex-ish than it should be - already on top of handling
> potentially
> >>>>> nested folders
> >>>>>
> >>>>> So my preference would be to stay with apache/beam - it's just more
> >>>>> consistently handling the case where provider packages can be
> one-level
> >>>>> nested
> >>>>>
> >>>>> J
> >>>>>
> >>>>>
> >>>>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org>
> >>>>> napisał:
> >>>>>
> >>>>>> Good question. I always found it confusing to have some providers at
> >>>>>> different level. Examples:
> >>>>>> - "airbyte" in "providers" directory (I would qualify it as
> "regular"
> >>>>>> provider)
> >>>>>> - "hive" in "providers/apache"
> >>>>>> - "amazon" in "providers" but which contains only one sub directory
> >>>> "aws"
> >>>>>>
> >>>>>> I would be in favor of using "-" instead of "/" so that all
> providers
> >>>> are
> >>>>>> at the same level.
> >>>>>>
> >>>>>>
> >>>>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote:
> >>>>>>> +1 one to this on general terms, it will hopefully reduce a lot of
> >> the
> >>>>>> boilerplate we need.
> >>>>>>>
> >>>>>>> As for the amazon/aws example specifically that does bring up a
> >>>>>> question, should we have `/` or `-`.. to give some examples:
> >>>>>>>
> >>>>>>> cncf kubernetes: ./providers/cncf/kubernetes or
> >>>>>> ./providers/cncf-kubernetes
> >>>>>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive
> >>>>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws
> >>>>>>>
> >>>>>>> There is no requirement from python etc on one form or the other
> (as
> >>>>>> it’s just a folder, not part of the module name), so it’s what ever
> >>>> makes
> >>>>>> most sense to us.
> >>>>>>>
> >>>>>>> Jarek and Dennis (and others): what are your preferences on these
> >>>> styles?
> >>>>>>>
> >>>>>>> -ash
> >>>>>>>
> >>>>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>>>>>>
> >>>>>>>> Oh. . And one other benefit of it will be that we will be able to
> >> get
> >>>>>> rid
> >>>>>>>> of about 40% of the "Providers Manager" code. Currently, in
> >> Providers
> >>>>>>>> manager we have a lot of "ifs" that make it possible to use
> >> providers
> >>>>>> in
> >>>>>>>> breeze and local environment from the sources. In "production"
> >>>>>> installation
> >>>>>>>> we are using "get_provider_info"  entry points to discover
> providers
> >>>>>> and
> >>>>>>>> discover if provider is installed, but when you use current
> >> providers
> >>>>>>>> installed in Breeze to inside "airflow", we rely on
> `provider.yaml`
> >> to
> >>>>>> be
> >>>>>>>> present in the "airflow.providers.PROVIDER_ID" path - so we
> >>>> effectively
> >>>>>>>> have two paths of discovering information about the providers
> >>>>>> installed.
> >>>>>>>>
> >>>>>>>> After all providers are migrated to the new structure, all
> providers
> >>>>>> are
> >>>>>>>> separate "distributions" - and when you run `uv sync`  (which will
> >>>>>> install
> >>>>>>>> all providers thanks to workspace feature) or `pip install -e
> >>>>>>>> ./providers/aws` (which you will have to do manually to work on
> the
> >>>>>>>> provider - if you use `pip` rather than uv) - then we will not
> have
> >> to
> >>>>>> use
> >>>>>>>> the separate path to read provider.yaml, because the right
> >> entrypoint
> >>>>>> for
> >>>>>>>> the provider will be installed as well - so we will be able to get
> >> rid
> >>>>>> of
> >>>>>>>> quite some code that is currently only used in airflow development
> >>>>>>>> environment.
> >>>>>>>>
> >>>>>>>> J.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Those are very good questions :)
> >>>>>>>>>
> >>>>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
> >>>>>>>>> <ferru...@amazon.com.invalid> wrote:
> >>>>>>>>>
> >>>>>>>>>> To clarify that I understand your diagram correctly, let's say
> you
> >>>>>> clone
> >>>>>>>>>> the Airflow repo to ~/workspace/airflow/.  Does this mean that
> the
> >>>>>> AWS Glue
> >>>>>>>>>> Hook which used to live at
> >>>>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a
> >> random
> >>>>>>>>>> example) will be located at
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
> >>>>>>>>>> That feels unnecessarily repetitive to me, maybe it makes sense
> >> but
> >>>>>> I'm
> >>>>>>>>>> missing the context?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Yes - it means that there is this repetitiveness but for a good
> >>>>>> reason -
> >>>>>>>>> those two "amazon/aws" serve different purpose:
> >>>>>>>>>
> >>>>>>>>> * The first "providers/amazon/aws" is just where the whole
> provider
> >>>>>>>>> "complete project" is stored - it's basically a directory where
> >> "aws
> >>>>>>>>> provider" is stored, a convenient folder to locate it in, that
> >> makes
> >>>>>> it
> >>>>>>>>> separate from other providers
> >>>>>>>>> * The second "src/airflow/providers/amazon/aws" - is the python
> >>>>>>>>> package where the source files is stored - this is how (inside
> the
> >>>>>>>>> sub-folder) you tell the actual python "import" to look for the
> >>>>>> sources.
> >>>>>>>>>
> >>>>>>>>> .What really matters is that (eventually)
> >>>>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a
> >>>>>> completely
> >>>>>>>>> separate python project - a source of a "standalone" provider
> >> python
> >>>>>>>>> project.
> >>>>>>>>> There is a "pyproject.toml" file at the root of it and if you do
> >> this
> >>>>>> (for
> >>>>>>>>> example):
> >>>>>>>>>
> >>>>>>>>> cd providers/amazon/aws/
> >>>>>>>>> uv sync
> >>>>>>>>>
> >>>>>>>>> And with it you will be able to work on AWS provider exclusively
> >> as a
> >>>>>>>>> separate project (this is not yet complete with the move - tests
> >> are
> >>>>>> not
> >>>>>>>>> entirely possible to run today - but it will be possible as next
> >> step
> >>>>>> - I
> >>>>>>>>> explained it in
> >>>>>>>>>
> >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> >>>>>>>>>
> >>>>>>>>> This has a number of benefits, but one of them is that you will
> be
> >>>>>> able to
> >>>>>>>>> build provider by just running `build` command of your favourite
> >>>>>>>>> PEP-standard compliant frontend:
> >>>>>>>>>
> >>>>>>>>> cd providers/amazon/aws/
> >>>>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build`
> >> )....
> >>>>>>>>>
> >>>>>>>>> This will create  the provider package inside the `dist" folder.
> I
> >>>>>> just
> >>>>>>>>> did it in my PR with `uv` in the first "airbyte` project:
> >>>>>>>>>
> >>>>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build
> >>>>>>>>> Building source distribution...
> >>>>>>>>> Building wheel from source distribution...
> >>>>>>>>> Successfully built
> >> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
> >>>>>>>>> Successfully built
> >>>>>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
> >>>>>>>>>
> >>>>>>>>> That's it. That also allows cases like installing provider
> packages
> >>>>>> using
> >>>>>>>>> git URLs - which I used earlier today to test if the incoming PR
> of
> >>>>>>>>> pygments is actually solving the problem we had yesteday
> >>>>>>>>> https://github.com/apache/airflow/pull/45416  (basically we just
> >>>>>> make our
> >>>>>>>>> provider packages "standard" python packages that all the tools
> >>>>>> support.
> >>>>>>>>> Anyone who would like to install a commit, hash or branch version
> >> of
> >>>>>> the
> >>>>>>>>> "airbyte" package from main version of Airflow repo will be able
> to
> >>>>>> do:
> >>>>>>>>>
> >>>>>>>>> pip install "apache-airflow-providers-airbyte @ git+
> >>>>>>>>>
> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
> >>>>>>>>>
> >>>>>>>>> Currently in order to create the package we need to manually
> >> extract
> >>>>>> the
> >>>>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some
> files
> >>>>>>>>> (pyproject.toml, README.rst and few others) and only then we
> build
> >>>>>> the
> >>>>>>>>> package. All this - copying file structure, creating new files,
> >>>>>> running the
> >>>>>>>>> build command after and finally deleting the copied files is now
> -
> >>>>>>>>> dynamically and under-the-hood created by "breeze
> >> release-management
> >>>>>>>>> prepare-provider-packages" command. With this change, the
> directory
> >>>>>>>>> structure in `git` repo of ours is totally standard and allows us
> >>>> (and
> >>>>>>>>> anyone else) to build the package directly from it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And what is the plan for system tests?   As part of this
> >>>>>> reorganization,
> >>>>>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system?
> >> That
> >>>>>> seems
> >>>>>>>>>> more intuitive to me than their current location in
> >>>>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> Oh yeah - I missed that in the original structure as the
> "airbyte"
> >>>>>>>>> provider (that I chose as first one) did not contain the "system"
> >>>>>> tests -
> >>>>>>>>> but one of the two providers after that i was planning to make
> sure
> >>>>>> system
> >>>>>>>>> tests are covered. They are supposed to be moved to
> "tests/system"
> >> of
> >>>>>>>>> course - Elad had similar question and I also explained it in
> >> detail
> >>>>>> in
> >>>>>>>>>
> >> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I hope it answers the questions. If not - I am happy to add more
> >>>>>>>>> clarifications :)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> J.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to