I plan to cut a wave soon but we can't test on airbyte as there were no
changes since last release

On Tue, Jan 14, 2025 at 8:52 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> First provider (airbyte) merged. Managed to walk around Sphinx without
> answering too many riddles finally.
>
> In the next coming days I will migrate a few more providers and once we get
> through a next provider release cycle with those few providers and fix/see
> any kind of teething issues I will add semi-automation to move the provider
> one-by one and I will ask for help in individually migrating all of them.
>
> The two limitations that will need to be solved later (so the provider
> projects are not yet truly standalone)
>
> * tests are only running from within airflow - you cannot run tests inside
> `providers/airbyte" yet
> * doc build also currently works only from "airflow" main repo. - you can't
> build docs as "command" in the build of the provider project
>
> Both are solvable but require some more reshuffling and sphinx scripts
> rewriting basically, so something that will be way easier when we migrate
> all providers and remove the old provider approach.
>
> I will summarize the changes you should expect when contributing to a
> provider in a separate mail once I complete a few more providers.
>
> J.
>
> On Fri, Jan 10, 2025 at 5:52 PM Vincent Beck <vincb...@apache.org> wrote:
>
> > Same, I am not strongly opinionated, just a preference :)
> >
> > On 2025/01/10 15:36:05 Vikram Koka wrote:
> > > I agree this makes sense.
> > >
> > > I was originally concerned that this would make it more difficult to
> > ensure
> > > compatibility across providers for capabilities such as common.sql,
> > > objectstore, and so on.
> > > However, seeing that the "common" pattern would remain the same and
> it's
> > > only the code layout that is changing, and that we are getting rid of a
> > ton
> > > of generated code, I am positive on this.
> > >
> > >
> > > On Fri, Jan 10, 2025 at 3:07 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> > >
> > > > So I propose letting the "doer" make the decision if we are split.
> > > >
> > > > On Fri, Jan 10, 2025 at 11:52 AM Ash Berlin-Taylor <a...@apache.org>
> > wrote:
> > > >
> > > > > Not strong at all, preference Is all. It sounds like Vincent and I
> > are in
> > > > > the hyphen camp and you and Maciej are in the slash camp.
> > > > >
> > > > > +1 on the “I don’t care what code style is used as long as it is
> > > > > programmatically enforced”.
> > > > >
> > > > > -a
> > > > >
> > > > > > On 10 Jan 2025, at 09:41, Jarek Potiuk <ja...@potiuk.com> wrote:
> > > > > >
> > > > > > Is there anything else that "tastes" Ash ? A concrete reason that
> > makes
> > > > > you
> > > > > > think the "-" prefix in this case is better than the "/" folder?
> > How
> > > > > > strong is your "taste" preference and do you think it will have
> > some
> > > > > > lasting effect if we choose to flatten the folder structure?
> > > > > >
> > > > > > I might make a small vote to see what is the preference of people
> > if we
> > > > > > think this is an important aspect.
> > > > > >
> > > > > > BTW. This is why I really love black/ruff formatting - we stopped
> > > > wasting
> > > > > > time on "taste" discussion - it does not matter what is the
> > individual
> > > > > > preference, consistency is more important and I prefer to do
> stuff
> > that
> > > > > > really matters but if people feel strongly that we should discuss
> > it, I
> > > > > > might make a vote there.
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor <a...@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > >> My preference is for being “more direct” and not having deeply
> > nested
> > > > > >> things where possible — I think Microsoft might be the one case
> > where
> > > > > >> having extra folders makes sense. And I’m fine with things not
> > being
> > > > > >> consistent across providers/groups of providers.
> > > > > >>
> > > > > >> -ash
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>> On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > > > > >>>
> > > > > >>> Can you give an example of what might break why having
> > > > > >>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
> > > > > >>>
> > > > > >>> Nothing will break. It's just:
> > > > > >>>
> > > > > >>> * the code will have to be a little more complex as it will
> have
> > to
> > > > do
> > > > > >> some
> > > > > >>> conditional writes of "-" "/"
> > > > > >>> * there will be inconsistency in the depth of folders - outside
> > it
> > > > will
> > > > > >> be
> > > > > >>> 1, inside it will be 2 (as it is in your example)/
> > > > > >>> * it will be a bit more convention/ complex to limit related
> > > > providers
> > > > > >> (say
> > > > > >>> microsoft) - with the current scheme "providers/microsoft" is
> the
> > > > > >> directory
> > > > > >>> containing all microsoft providers. If we change it to "-", you
> > have
> > > > to
> > > > > >>> find all sub-directories following "microsoft-*" convention.
> > > > > >>>
> > > > > >>> I am not super-strong on it - we could do either, it's just my
> > > > > preference
> > > > > >>> to use folders for grouping related things (as folders were
> > designed
> > > > > >> for).
> > > > > >>>
> > > > > >>> J.
> > > > > >>>
> > > > > >>> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <
> a...@apache.org
> > >
> > > > > wrote:
> > > > > >>>
> > > > > >>>>> And we already have a number of mappings and conventions to
> > handle
> > > > > >> that.
> > > > > >>>>> For example provider I'd mapping to dirs (apache.beam ->
> > > > > apache/beam),
> > > > > >>>> and
> > > > > >>>>> 'apache-airflow-providers-apache-beam' as package na e  and
> > > > > >>>>> airflow/providers/apache/beam as packages inside the
> > distribution.
> > > > > >> Those
> > > > > >>>>> will remain as they are - we cannot change them without
> > breaking
> > > > > >>>>> compatibility.
> > > > > >>>>
> > > > > >>>> Can you give an example of what might break why having
> > > > > >>>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
> > > > > >>>>
> > > > > >>>> -a
> > > > > >>>>
> > > > > >>>>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com>
> > wrote:
> > > > > >>>>>
> > > > > >>>>> I think it will be better to keep it.
> > > > > >>>>>
> > > > > >>>>> The reason we have varying levels were to group things
> > together -
> > > > > >> mainly
> > > > > >>>>> Apache related providers, but also Microsoft.
> > > > > >>>>>
> > > > > >>>>> And we already have a number of mappings and conventions to
> > handle
> > > > > >> that.
> > > > > >>>>> For example provider I'd mapping to dirs (apache.beam ->
> > > > > apache/beam),
> > > > > >>>> and
> > > > > >>>>> 'apache-airflow-providers-apache-beam' as package na e  and
> > > > > >>>>> airflow/providers/apache/beam as packages inside the
> > distribution.
> > > > > >> Those
> > > > > >>>>> will remain as they are - we cannot change them without
> > breaking
> > > > > >>>>> compatibility.
> > > > > >>>>>
> > > > > >>>>> So if we change it to a flat structure we will have some
> > > > > >> inconsistencies
> > > > > >>>> -
> > > > > >>>>> in some cases it will be single folder in others (packages)
> > those
> > > > > will
> > > > > >> be
> > > > > >>>>> two folders.
> > > > > >>>>>
> > > > > >>>>> I think it will be more harm than good if we get rid of the
> > > > 'folder'
> > > > > >>>>> structures - some of the code in breeze will have to treat
> > those
> > > > > >>>>> differently as well. Nothing extraordinary and very complex
> but
> > > > more
> > > > > >>>>> complex-ish than it should be - already on top of handling
> > > > > potentially
> > > > > >>>>> nested folders
> > > > > >>>>>
> > > > > >>>>> So my preference would be to stay with apache/beam - it's
> just
> > more
> > > > > >>>>> consistently handling the case where provider packages can be
> > > > > one-level
> > > > > >>>>> nested
> > > > > >>>>>
> > > > > >>>>> J
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <
> > > > vincb...@apache.org>
> > > > > >>>>> napisał:
> > > > > >>>>>
> > > > > >>>>>> Good question. I always found it confusing to have some
> > providers
> > > > at
> > > > > >>>>>> different level. Examples:
> > > > > >>>>>> - "airbyte" in "providers" directory (I would qualify it as
> > > > > "regular"
> > > > > >>>>>> provider)
> > > > > >>>>>> - "hive" in "providers/apache"
> > > > > >>>>>> - "amazon" in "providers" but which contains only one sub
> > > > directory
> > > > > >>>> "aws"
> > > > > >>>>>>
> > > > > >>>>>> I would be in favor of using "-" instead of "/" so that all
> > > > > providers
> > > > > >>>> are
> > > > > >>>>>> at the same level.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote:
> > > > > >>>>>>> +1 one to this on general terms, it will hopefully reduce a
> > lot
> > > > of
> > > > > >> the
> > > > > >>>>>> boilerplate we need.
> > > > > >>>>>>>
> > > > > >>>>>>> As for the amazon/aws example specifically that does bring
> > up a
> > > > > >>>>>> question, should we have `/` or `-`.. to give some examples:
> > > > > >>>>>>>
> > > > > >>>>>>> cncf kubernetes: ./providers/cncf/kubernetes or
> > > > > >>>>>> ./providers/cncf-kubernetes
> > > > > >>>>>>> Apache hive: ./providers/apache/hive or
> > ./providers/apache-hive
> > > > > >>>>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws
> > > > > >>>>>>>
> > > > > >>>>>>> There is no requirement from python etc on one form or the
> > other
> > > > > (as
> > > > > >>>>>> it’s just a folder, not part of the module name), so it’s
> what
> > > > ever
> > > > > >>>> makes
> > > > > >>>>>> most sense to us.
> > > > > >>>>>>>
> > > > > >>>>>>> Jarek and Dennis (and others): what are your preferences on
> > these
> > > > > >>>> styles?
> > > > > >>>>>>>
> > > > > >>>>>>> -ash
> > > > > >>>>>>>
> > > > > >>>>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com>
> > wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> Oh. . And one other benefit of it will be that we will be
> > able
> > > > to
> > > > > >> get
> > > > > >>>>>> rid
> > > > > >>>>>>>> of about 40% of the "Providers Manager" code. Currently,
> in
> > > > > >> Providers
> > > > > >>>>>>>> manager we have a lot of "ifs" that make it possible to
> use
> > > > > >> providers
> > > > > >>>>>> in
> > > > > >>>>>>>> breeze and local environment from the sources. In
> > "production"
> > > > > >>>>>> installation
> > > > > >>>>>>>> we are using "get_provider_info"  entry points to discover
> > > > > providers
> > > > > >>>>>> and
> > > > > >>>>>>>> discover if provider is installed, but when you use
> current
> > > > > >> providers
> > > > > >>>>>>>> installed in Breeze to inside "airflow", we rely on
> > > > > `provider.yaml`
> > > > > >> to
> > > > > >>>>>> be
> > > > > >>>>>>>> present in the "airflow.providers.PROVIDER_ID" path - so
> we
> > > > > >>>> effectively
> > > > > >>>>>>>> have two paths of discovering information about the
> > providers
> > > > > >>>>>> installed.
> > > > > >>>>>>>>
> > > > > >>>>>>>> After all providers are migrated to the new structure, all
> > > > > providers
> > > > > >>>>>> are
> > > > > >>>>>>>> separate "distributions" - and when you run `uv sync`
> > (which
> > > > will
> > > > > >>>>>> install
> > > > > >>>>>>>> all providers thanks to workspace feature) or `pip install
> > -e
> > > > > >>>>>>>> ./providers/aws` (which you will have to do manually to
> > work on
> > > > > the
> > > > > >>>>>>>> provider - if you use `pip` rather than uv) - then we will
> > not
> > > > > have
> > > > > >> to
> > > > > >>>>>> use
> > > > > >>>>>>>> the separate path to read provider.yaml, because the right
> > > > > >> entrypoint
> > > > > >>>>>> for
> > > > > >>>>>>>> the provider will be installed as well - so we will be
> able
> > to
> > > > get
> > > > > >> rid
> > > > > >>>>>> of
> > > > > >>>>>>>> quite some code that is currently only used in airflow
> > > > development
> > > > > >>>>>>>> environment.
> > > > > >>>>>>>>
> > > > > >>>>>>>> J.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <
> > ja...@potiuk.com>
> > > > > >>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Those are very good questions :)
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
> > > > > >>>>>>>>> <ferru...@amazon.com.invalid> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> To clarify that I understand your diagram correctly,
> > let's say
> > > > > you
> > > > > >>>>>> clone
> > > > > >>>>>>>>>> the Airflow repo to ~/workspace/airflow/.  Does this
> mean
> > that
> > > > > the
> > > > > >>>>>> AWS Glue
> > > > > >>>>>>>>>> Hook which used to live at
> > > > > >>>>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py
> > (as a
> > > > > >> random
> > > > > >>>>>>>>>> example) will be located at
> > > > > >>>>>>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>
> > > > >
> > > >
> >
> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
> > > > > >>>>>>>>>> That feels unnecessarily repetitive to me, maybe it
> makes
> > > > sense
> > > > > >> but
> > > > > >>>>>> I'm
> > > > > >>>>>>>>>> missing the context?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Yes - it means that there is this repetitiveness but for
> a
> > good
> > > > > >>>>>> reason -
> > > > > >>>>>>>>> those two "amazon/aws" serve different purpose:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> * The first "providers/amazon/aws" is just where the
> whole
> > > > > provider
> > > > > >>>>>>>>> "complete project" is stored - it's basically a directory
> > where
> > > > > >> "aws
> > > > > >>>>>>>>> provider" is stored, a convenient folder to locate it in,
> > that
> > > > > >> makes
> > > > > >>>>>> it
> > > > > >>>>>>>>> separate from other providers
> > > > > >>>>>>>>> * The second "src/airflow/providers/amazon/aws" - is the
> > python
> > > > > >>>>>>>>> package where the source files is stored - this is how
> > (inside
> > > > > the
> > > > > >>>>>>>>> sub-folder) you tell the actual python "import" to look
> > for the
> > > > > >>>>>> sources.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> .What really matters is that (eventually)
> > > > > >>>>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be
> treated
> > as a
> > > > > >>>>>> completely
> > > > > >>>>>>>>> separate python project - a source of a "standalone"
> > provider
> > > > > >> python
> > > > > >>>>>>>>> project.
> > > > > >>>>>>>>> There is a "pyproject.toml" file at the root of it and if
> > you
> > > > do
> > > > > >> this
> > > > > >>>>>> (for
> > > > > >>>>>>>>> example):
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> cd providers/amazon/aws/
> > > > > >>>>>>>>> uv sync
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> And with it you will be able to work on AWS provider
> > > > exclusively
> > > > > >> as a
> > > > > >>>>>>>>> separate project (this is not yet complete with the move
> -
> > > > tests
> > > > > >> are
> > > > > >>>>>> not
> > > > > >>>>>>>>> entirely possible to run today - but it will be possible
> as
> > > > next
> > > > > >> step
> > > > > >>>>>> - I
> > > > > >>>>>>>>> explained it in
> > > > > >>>>>>>>>
> > > > > >>
> > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> This has a number of benefits, but one of them is that
> you
> > will
> > > > > be
> > > > > >>>>>> able to
> > > > > >>>>>>>>> build provider by just running `build` command of your
> > > > favourite
> > > > > >>>>>>>>> PEP-standard compliant frontend:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> cd providers/amazon/aws/
> > > > > >>>>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit
> > build`
> > > > > >> )....
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> This will create  the provider package inside the `dist"
> > > > folder.
> > > > > I
> > > > > >>>>>> just
> > > > > >>>>>>>>> did it in my PR with `uv` in the first "airbyte` project:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv
> build
> > > > > >>>>>>>>> Building source distribution...
> > > > > >>>>>>>>> Building wheel from source distribution...
> > > > > >>>>>>>>> Successfully built
> > > > > >> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
> > > > > >>>>>>>>> Successfully built
> > > > > >>>>>>>>>
> > dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> That's it. That also allows cases like installing
> provider
> > > > > packages
> > > > > >>>>>> using
> > > > > >>>>>>>>> git URLs - which I used earlier today to test if the
> > incoming
> > > > PR
> > > > > of
> > > > > >>>>>>>>> pygments is actually solving the problem we had yesteday
> > > > > >>>>>>>>> https://github.com/apache/airflow/pull/45416  (basically
> > we
> > > > just
> > > > > >>>>>> make our
> > > > > >>>>>>>>> provider packages "standard" python packages that all the
> > tools
> > > > > >>>>>> support.
> > > > > >>>>>>>>> Anyone who would like to install a commit, hash or branch
> > > > version
> > > > > >> of
> > > > > >>>>>> the
> > > > > >>>>>>>>> "airbyte" package from main version of Airflow repo will
> be
> > > > able
> > > > > to
> > > > > >>>>>> do:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> pip install "apache-airflow-providers-airbyte @ git+
> > > > > >>>>>>>>>
> > > > > https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Currently in order to create the package we need to
> > manually
> > > > > >> extract
> > > > > >>>>>> the
> > > > > >>>>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically
> > some
> > > > > files
> > > > > >>>>>>>>> (pyproject.toml, README.rst and few others) and only then
> > we
> > > > > build
> > > > > >>>>>> the
> > > > > >>>>>>>>> package. All this - copying file structure, creating new
> > files,
> > > > > >>>>>> running the
> > > > > >>>>>>>>> build command after and finally deleting the copied files
> > is
> > > > now
> > > > > -
> > > > > >>>>>>>>> dynamically and under-the-hood created by "breeze
> > > > > >> release-management
> > > > > >>>>>>>>> prepare-provider-packages" command. With this change, the
> > > > > directory
> > > > > >>>>>>>>> structure in `git` repo of ours is totally standard and
> > allows
> > > > us
> > > > > >>>> (and
> > > > > >>>>>>>>> anyone else) to build the package directly from it.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> And what is the plan for system tests?   As part of this
> > > > > >>>>>> reorganization,
> > > > > >>>>>>>>>> could they be moved into
> > providers/{PROVIDER_ID}/tests/system?
> > > > > >> That
> > > > > >>>>>> seems
> > > > > >>>>>>>>>> more intuitive to me than their current location in
> > > > > >>>>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> Oh yeah - I missed that in the original structure as the
> > > > > "airbyte"
> > > > > >>>>>>>>> provider (that I chose as first one) did not contain the
> > > > "system"
> > > > > >>>>>> tests -
> > > > > >>>>>>>>> but one of the two providers after that i was planning to
> > make
> > > > > sure
> > > > > >>>>>> system
> > > > > >>>>>>>>> tests are covered. They are supposed to be moved to
> > > > > "tests/system"
> > > > > >> of
> > > > > >>>>>>>>> course - Elad had similar question and I also explained
> it
> > in
> > > > > >> detail
> > > > > >>>>>> in
> > > > > >>>>>>>>>
> > > > > >>
> > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I hope it answers the questions. If not - I am happy to
> add
> > > > more
> > > > > >>>>>>>>> clarifications :)
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> J.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > >
> ---------------------------------------------------------------------
> > > > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > > >>>>>> For additional commands, e-mail:
> dev-h...@airflow.apache.org
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > ---------------------------------------------------------------------
> > > > > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > > >>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> > > > > >>>>
> > > > > >>>>
> > > > > >>
> > > > > >>
> > > > > >>
> > ---------------------------------------------------------------------
> > > > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > > >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>

Reply via email to