I will try to get celery next - we need a bugfix for it after yanking the last release.
On Tue, Jan 14, 2025 at 9:46 AM Elad Kalif <elad...@apache.org> wrote: > I plan to cut a wave soon but we can't test on airbyte as there were no > changes since last release > > On Tue, Jan 14, 2025 at 8:52 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > First provider (airbyte) merged. Managed to walk around Sphinx without > > answering too many riddles finally. > > > > In the next coming days I will migrate a few more providers and once we > get > > through a next provider release cycle with those few providers and > fix/see > > any kind of teething issues I will add semi-automation to move the > provider > > one-by one and I will ask for help in individually migrating all of them. > > > > The two limitations that will need to be solved later (so the provider > > projects are not yet truly standalone) > > > > * tests are only running from within airflow - you cannot run tests > inside > > `providers/airbyte" yet > > * doc build also currently works only from "airflow" main repo. - you > can't > > build docs as "command" in the build of the provider project > > > > Both are solvable but require some more reshuffling and sphinx scripts > > rewriting basically, so something that will be way easier when we migrate > > all providers and remove the old provider approach. > > > > I will summarize the changes you should expect when contributing to a > > provider in a separate mail once I complete a few more providers. > > > > J. > > > > On Fri, Jan 10, 2025 at 5:52 PM Vincent Beck <vincb...@apache.org> > wrote: > > > > > Same, I am not strongly opinionated, just a preference :) > > > > > > On 2025/01/10 15:36:05 Vikram Koka wrote: > > > > I agree this makes sense. > > > > > > > > I was originally concerned that this would make it more difficult to > > > ensure > > > > compatibility across providers for capabilities such as common.sql, > > > > objectstore, and so on. > > > > However, seeing that the "common" pattern would remain the same and > > it's > > > > only the code layout that is changing, and that we are getting rid > of a > > > ton > > > > of generated code, I am positive on this. > > > > > > > > > > > > On Fri, Jan 10, 2025 at 3:07 AM Jarek Potiuk <ja...@potiuk.com> > wrote: > > > > > > > > > So I propose letting the "doer" make the decision if we are split. > > > > > > > > > > On Fri, Jan 10, 2025 at 11:52 AM Ash Berlin-Taylor <a...@apache.org > > > > > wrote: > > > > > > > > > > > Not strong at all, preference Is all. It sounds like Vincent and > I > > > are in > > > > > > the hyphen camp and you and Maciej are in the slash camp. > > > > > > > > > > > > +1 on the “I don’t care what code style is used as long as it is > > > > > > programmatically enforced”. > > > > > > > > > > > > -a > > > > > > > > > > > > > On 10 Jan 2025, at 09:41, Jarek Potiuk <ja...@potiuk.com> > wrote: > > > > > > > > > > > > > > Is there anything else that "tastes" Ash ? A concrete reason > that > > > makes > > > > > > you > > > > > > > think the "-" prefix in this case is better than the "/" > folder? > > > How > > > > > > > strong is your "taste" preference and do you think it will have > > > some > > > > > > > lasting effect if we choose to flatten the folder structure? > > > > > > > > > > > > > > I might make a small vote to see what is the preference of > people > > > if we > > > > > > > think this is an important aspect. > > > > > > > > > > > > > > BTW. This is why I really love black/ruff formatting - we > stopped > > > > > wasting > > > > > > > time on "taste" discussion - it does not matter what is the > > > individual > > > > > > > preference, consistency is more important and I prefer to do > > stuff > > > that > > > > > > > really matters but if people feel strongly that we should > discuss > > > it, I > > > > > > > might make a vote there. > > > > > > > > > > > > > > J. > > > > > > > > > > > > > > On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor < > a...@apache.org > > > > > > > > wrote: > > > > > > > > > > > > > >> My preference is for being “more direct” and not having deeply > > > nested > > > > > > >> things where possible — I think Microsoft might be the one > case > > > where > > > > > > >> having extra folders makes sense. And I’m fine with things not > > > being > > > > > > >> consistent across providers/groups of providers. > > > > > > >> > > > > > > >> -ash > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >>> On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> > > wrote: > > > > > > >>> > > > > > > >>> Can you give an example of what might break why having > > > > > > >>> `providers/aapche-beam/src/airflow/providers/apache/beam`? > > > > > > >>> > > > > > > >>> Nothing will break. It's just: > > > > > > >>> > > > > > > >>> * the code will have to be a little more complex as it will > > have > > > to > > > > > do > > > > > > >> some > > > > > > >>> conditional writes of "-" "/" > > > > > > >>> * there will be inconsistency in the depth of folders - > outside > > > it > > > > > will > > > > > > >> be > > > > > > >>> 1, inside it will be 2 (as it is in your example)/ > > > > > > >>> * it will be a bit more convention/ complex to limit related > > > > > providers > > > > > > >> (say > > > > > > >>> microsoft) - with the current scheme "providers/microsoft" is > > the > > > > > > >> directory > > > > > > >>> containing all microsoft providers. If we change it to "-", > you > > > have > > > > > to > > > > > > >>> find all sub-directories following "microsoft-*" convention. > > > > > > >>> > > > > > > >>> I am not super-strong on it - we could do either, it's just > my > > > > > > preference > > > > > > >>> to use folders for grouping related things (as folders were > > > designed > > > > > > >> for). > > > > > > >>> > > > > > > >>> J. > > > > > > >>> > > > > > > >>> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor < > > a...@apache.org > > > > > > > > > > wrote: > > > > > > >>> > > > > > > >>>>> And we already have a number of mappings and conventions to > > > handle > > > > > > >> that. > > > > > > >>>>> For example provider I'd mapping to dirs (apache.beam -> > > > > > > apache/beam), > > > > > > >>>> and > > > > > > >>>>> 'apache-airflow-providers-apache-beam' as package na e and > > > > > > >>>>> airflow/providers/apache/beam as packages inside the > > > distribution. > > > > > > >> Those > > > > > > >>>>> will remain as they are - we cannot change them without > > > breaking > > > > > > >>>>> compatibility. > > > > > > >>>> > > > > > > >>>> Can you give an example of what might break why having > > > > > > >>>> `providers/aapche-beam/src/airflow/providers/apache/beam`? > > > > > > >>>> > > > > > > >>>> -a > > > > > > >>>> > > > > > > >>>>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> > > > wrote: > > > > > > >>>>> > > > > > > >>>>> I think it will be better to keep it. > > > > > > >>>>> > > > > > > >>>>> The reason we have varying levels were to group things > > > together - > > > > > > >> mainly > > > > > > >>>>> Apache related providers, but also Microsoft. > > > > > > >>>>> > > > > > > >>>>> And we already have a number of mappings and conventions to > > > handle > > > > > > >> that. > > > > > > >>>>> For example provider I'd mapping to dirs (apache.beam -> > > > > > > apache/beam), > > > > > > >>>> and > > > > > > >>>>> 'apache-airflow-providers-apache-beam' as package na e and > > > > > > >>>>> airflow/providers/apache/beam as packages inside the > > > distribution. > > > > > > >> Those > > > > > > >>>>> will remain as they are - we cannot change them without > > > breaking > > > > > > >>>>> compatibility. > > > > > > >>>>> > > > > > > >>>>> So if we change it to a flat structure we will have some > > > > > > >> inconsistencies > > > > > > >>>> - > > > > > > >>>>> in some cases it will be single folder in others (packages) > > > those > > > > > > will > > > > > > >> be > > > > > > >>>>> two folders. > > > > > > >>>>> > > > > > > >>>>> I think it will be more harm than good if we get rid of the > > > > > 'folder' > > > > > > >>>>> structures - some of the code in breeze will have to treat > > > those > > > > > > >>>>> differently as well. Nothing extraordinary and very complex > > but > > > > > more > > > > > > >>>>> complex-ish than it should be - already on top of handling > > > > > > potentially > > > > > > >>>>> nested folders > > > > > > >>>>> > > > > > > >>>>> So my preference would be to stay with apache/beam - it's > > just > > > more > > > > > > >>>>> consistently handling the case where provider packages can > be > > > > > > one-level > > > > > > >>>>> nested > > > > > > >>>>> > > > > > > >>>>> J > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck < > > > > > vincb...@apache.org> > > > > > > >>>>> napisał: > > > > > > >>>>> > > > > > > >>>>>> Good question. I always found it confusing to have some > > > providers > > > > > at > > > > > > >>>>>> different level. Examples: > > > > > > >>>>>> - "airbyte" in "providers" directory (I would qualify it > as > > > > > > "regular" > > > > > > >>>>>> provider) > > > > > > >>>>>> - "hive" in "providers/apache" > > > > > > >>>>>> - "amazon" in "providers" but which contains only one sub > > > > > directory > > > > > > >>>> "aws" > > > > > > >>>>>> > > > > > > >>>>>> I would be in favor of using "-" instead of "/" so that > all > > > > > > providers > > > > > > >>>> are > > > > > > >>>>>> at the same level. > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > >>>>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote: > > > > > > >>>>>>> +1 one to this on general terms, it will hopefully > reduce a > > > lot > > > > > of > > > > > > >> the > > > > > > >>>>>> boilerplate we need. > > > > > > >>>>>>> > > > > > > >>>>>>> As for the amazon/aws example specifically that does > bring > > > up a > > > > > > >>>>>> question, should we have `/` or `-`.. to give some > examples: > > > > > > >>>>>>> > > > > > > >>>>>>> cncf kubernetes: ./providers/cncf/kubernetes or > > > > > > >>>>>> ./providers/cncf-kubernetes > > > > > > >>>>>>> Apache hive: ./providers/apache/hive or > > > ./providers/apache-hive > > > > > > >>>>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws > > > > > > >>>>>>> > > > > > > >>>>>>> There is no requirement from python etc on one form or > the > > > other > > > > > > (as > > > > > > >>>>>> it’s just a folder, not part of the module name), so it’s > > what > > > > > ever > > > > > > >>>> makes > > > > > > >>>>>> most sense to us. > > > > > > >>>>>>> > > > > > > >>>>>>> Jarek and Dennis (and others): what are your preferences > on > > > these > > > > > > >>>> styles? > > > > > > >>>>>>> > > > > > > >>>>>>> -ash > > > > > > >>>>>>> > > > > > > >>>>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com > > > > > wrote: > > > > > > >>>>>>>> > > > > > > >>>>>>>> Oh. . And one other benefit of it will be that we will > be > > > able > > > > > to > > > > > > >> get > > > > > > >>>>>> rid > > > > > > >>>>>>>> of about 40% of the "Providers Manager" code. Currently, > > in > > > > > > >> Providers > > > > > > >>>>>>>> manager we have a lot of "ifs" that make it possible to > > use > > > > > > >> providers > > > > > > >>>>>> in > > > > > > >>>>>>>> breeze and local environment from the sources. In > > > "production" > > > > > > >>>>>> installation > > > > > > >>>>>>>> we are using "get_provider_info" entry points to > discover > > > > > > providers > > > > > > >>>>>> and > > > > > > >>>>>>>> discover if provider is installed, but when you use > > current > > > > > > >> providers > > > > > > >>>>>>>> installed in Breeze to inside "airflow", we rely on > > > > > > `provider.yaml` > > > > > > >> to > > > > > > >>>>>> be > > > > > > >>>>>>>> present in the "airflow.providers.PROVIDER_ID" path - so > > we > > > > > > >>>> effectively > > > > > > >>>>>>>> have two paths of discovering information about the > > > providers > > > > > > >>>>>> installed. > > > > > > >>>>>>>> > > > > > > >>>>>>>> After all providers are migrated to the new structure, > all > > > > > > providers > > > > > > >>>>>> are > > > > > > >>>>>>>> separate "distributions" - and when you run `uv sync` > > > (which > > > > > will > > > > > > >>>>>> install > > > > > > >>>>>>>> all providers thanks to workspace feature) or `pip > install > > > -e > > > > > > >>>>>>>> ./providers/aws` (which you will have to do manually to > > > work on > > > > > > the > > > > > > >>>>>>>> provider - if you use `pip` rather than uv) - then we > will > > > not > > > > > > have > > > > > > >> to > > > > > > >>>>>> use > > > > > > >>>>>>>> the separate path to read provider.yaml, because the > right > > > > > > >> entrypoint > > > > > > >>>>>> for > > > > > > >>>>>>>> the provider will be installed as well - so we will be > > able > > > to > > > > > get > > > > > > >> rid > > > > > > >>>>>> of > > > > > > >>>>>>>> quite some code that is currently only used in airflow > > > > > development > > > > > > >>>>>>>> environment. > > > > > > >>>>>>>> > > > > > > >>>>>>>> J. > > > > > > >>>>>>>> > > > > > > >>>>>>>> > > > > > > >>>>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk < > > > ja...@potiuk.com> > > > > > > >>>> wrote: > > > > > > >>>>>>>> > > > > > > >>>>>>>>> Those are very good questions :) > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis > > > > > > >>>>>>>>> <ferru...@amazon.com.invalid> wrote: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>>> To clarify that I understand your diagram correctly, > > > let's say > > > > > > you > > > > > > >>>>>> clone > > > > > > >>>>>>>>>> the Airflow repo to ~/workspace/airflow/. Does this > > mean > > > that > > > > > > the > > > > > > >>>>>> AWS Glue > > > > > > >>>>>>>>>> Hook which used to live at > > > > > > >>>>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py > > > (as a > > > > > > >> random > > > > > > >>>>>>>>>> example) will be located at > > > > > > >>>>>>>>>> > > > > > > >>>>>> > > > > > > >>>> > > > > > > >> > > > > > > > > > > > > > > > > > ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py? > > > > > > >>>>>>>>>> That feels unnecessarily repetitive to me, maybe it > > makes > > > > > sense > > > > > > >> but > > > > > > >>>>>> I'm > > > > > > >>>>>>>>>> missing the context? > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Yes - it means that there is this repetitiveness but > for > > a > > > good > > > > > > >>>>>> reason - > > > > > > >>>>>>>>> those two "amazon/aws" serve different purpose: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> * The first "providers/amazon/aws" is just where the > > whole > > > > > > provider > > > > > > >>>>>>>>> "complete project" is stored - it's basically a > directory > > > where > > > > > > >> "aws > > > > > > >>>>>>>>> provider" is stored, a convenient folder to locate it > in, > > > that > > > > > > >> makes > > > > > > >>>>>> it > > > > > > >>>>>>>>> separate from other providers > > > > > > >>>>>>>>> * The second "src/airflow/providers/amazon/aws" - is > the > > > python > > > > > > >>>>>>>>> package where the source files is stored - this is how > > > (inside > > > > > > the > > > > > > >>>>>>>>> sub-folder) you tell the actual python "import" to look > > > for the > > > > > > >>>>>> sources. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> .What really matters is that (eventually) > > > > > > >>>>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be > > treated > > > as a > > > > > > >>>>>> completely > > > > > > >>>>>>>>> separate python project - a source of a "standalone" > > > provider > > > > > > >> python > > > > > > >>>>>>>>> project. > > > > > > >>>>>>>>> There is a "pyproject.toml" file at the root of it and > if > > > you > > > > > do > > > > > > >> this > > > > > > >>>>>> (for > > > > > > >>>>>>>>> example): > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> cd providers/amazon/aws/ > > > > > > >>>>>>>>> uv sync > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> And with it you will be able to work on AWS provider > > > > > exclusively > > > > > > >> as a > > > > > > >>>>>>>>> separate project (this is not yet complete with the > move > > - > > > > > tests > > > > > > >> are > > > > > > >>>>>> not > > > > > > >>>>>>>>> entirely possible to run today - but it will be > possible > > as > > > > > next > > > > > > >> step > > > > > > >>>>>> - I > > > > > > >>>>>>>>> explained it in > > > > > > >>>>>>>>> > > > > > > >> > > > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> This has a number of benefits, but one of them is that > > you > > > will > > > > > > be > > > > > > >>>>>> able to > > > > > > >>>>>>>>> build provider by just running `build` command of your > > > > > favourite > > > > > > >>>>>>>>> PEP-standard compliant frontend: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> cd providers/amazon/aws/ > > > > > > >>>>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit > > > build` > > > > > > >> ).... > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> This will create the provider package inside the > `dist" > > > > > folder. > > > > > > I > > > > > > >>>>>> just > > > > > > >>>>>>>>> did it in my PR with `uv` in the first "airbyte` > project: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv > > build > > > > > > >>>>>>>>> Building source distribution... > > > > > > >>>>>>>>> Building wheel from source distribution... > > > > > > >>>>>>>>> Successfully built > > > > > > >> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz > > > > > > >>>>>>>>> Successfully built > > > > > > >>>>>>>>> > > > dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> That's it. That also allows cases like installing > > provider > > > > > > packages > > > > > > >>>>>> using > > > > > > >>>>>>>>> git URLs - which I used earlier today to test if the > > > incoming > > > > > PR > > > > > > of > > > > > > >>>>>>>>> pygments is actually solving the problem we had > yesteday > > > > > > >>>>>>>>> https://github.com/apache/airflow/pull/45416 > (basically > > > we > > > > > just > > > > > > >>>>>> make our > > > > > > >>>>>>>>> provider packages "standard" python packages that all > the > > > tools > > > > > > >>>>>> support. > > > > > > >>>>>>>>> Anyone who would like to install a commit, hash or > branch > > > > > version > > > > > > >> of > > > > > > >>>>>> the > > > > > > >>>>>>>>> "airbyte" package from main version of Airflow repo > will > > be > > > > > able > > > > > > to > > > > > > >>>>>> do: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> pip install "apache-airflow-providers-airbyte @ git+ > > > > > > >>>>>>>>> > > > > > > > https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID" > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Currently in order to create the package we need to > > > manually > > > > > > >> extract > > > > > > >>>>>> the > > > > > > >>>>>>>>> "amazon" subtree, copy it elsewhere, prepare > dynamically > > > some > > > > > > files > > > > > > >>>>>>>>> (pyproject.toml, README.rst and few others) and only > then > > > we > > > > > > build > > > > > > >>>>>> the > > > > > > >>>>>>>>> package. All this - copying file structure, creating > new > > > files, > > > > > > >>>>>> running the > > > > > > >>>>>>>>> build command after and finally deleting the copied > files > > > is > > > > > now > > > > > > - > > > > > > >>>>>>>>> dynamically and under-the-hood created by "breeze > > > > > > >> release-management > > > > > > >>>>>>>>> prepare-provider-packages" command. With this change, > the > > > > > > directory > > > > > > >>>>>>>>> structure in `git` repo of ours is totally standard and > > > allows > > > > > us > > > > > > >>>> (and > > > > > > >>>>>>>>> anyone else) to build the package directly from it. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> And what is the plan for system tests? As part of > this > > > > > > >>>>>> reorganization, > > > > > > >>>>>>>>>> could they be moved into > > > providers/{PROVIDER_ID}/tests/system? > > > > > > >> That > > > > > > >>>>>> seems > > > > > > >>>>>>>>>> more intuitive to me than their current location in > > > > > > >>>>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py. > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>> Oh yeah - I missed that in the original structure as > the > > > > > > "airbyte" > > > > > > >>>>>>>>> provider (that I chose as first one) did not contain > the > > > > > "system" > > > > > > >>>>>> tests - > > > > > > >>>>>>>>> but one of the two providers after that i was planning > to > > > make > > > > > > sure > > > > > > >>>>>> system > > > > > > >>>>>>>>> tests are covered. They are supposed to be moved to > > > > > > "tests/system" > > > > > > >> of > > > > > > >>>>>>>>> course - Elad had similar question and I also explained > > it > > > in > > > > > > >> detail > > > > > > >>>>>> in > > > > > > >>>>>>>>> > > > > > > >> > > > https://github.com/apache/airflow/pull/45259#issuecomment-2572427916 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> I hope it answers the questions. If not - I am happy to > > add > > > > > more > > > > > > >>>>>>>>> clarifications :) > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>>> J. > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > > > --------------------------------------------------------------------- > > > > > > >>>>>> To unsubscribe, e-mail: > dev-unsubscr...@airflow.apache.org > > > > > > >>>>>> For additional commands, e-mail: > > dev-h...@airflow.apache.org > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > --------------------------------------------------------------------- > > > > > > >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > > > >>>> For additional commands, e-mail: > dev-h...@airflow.apache.org > > > > > > >>>> > > > > > > >>>> > > > > > > >> > > > > > > >> > > > > > > >> > > > --------------------------------------------------------------------- > > > > > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > > > >> For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > >