> And we already have a number of mappings and conventions to handle that.
> For example provider I'd mapping to dirs (apache.beam -> apache/beam), and
> 'apache-airflow-providers-apache-beam' as package na e  and
> airflow/providers/apache/beam as packages inside the distribution. Those
> will remain as they are - we cannot change them without breaking
> compatibility.

Can you give an example of what might break why having 
`providers/aapche-beam/src/airflow/providers/apache/beam`?

-a

> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> I think it will be better to keep it.
> 
> The reason we have varying levels were to group things together - mainly
> Apache related providers, but also Microsoft.
> 
> And we already have a number of mappings and conventions to handle that.
> For example provider I'd mapping to dirs (apache.beam -> apache/beam), and
> 'apache-airflow-providers-apache-beam' as package na e  and
> airflow/providers/apache/beam as packages inside the distribution. Those
> will remain as they are - we cannot change them without breaking
> compatibility.
> 
> So if we change it to a flat structure we will have some inconsistencies -
> in some cases it will be single folder in others (packages) those will be
> two folders.
> 
> I think it will be more harm than good if we get rid of the 'folder'
> structures - some of the code in breeze will have to treat those
> differently as well. Nothing extraordinary and very complex but more
> complex-ish than it should be - already on top of handling potentially
> nested folders
> 
> So my preference would be to stay with apache/beam - it's just more
> consistently handling the case where provider packages can be one-level
> nested
> 
> J
> 
> 
> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org>
> napisał:
> 
>> Good question. I always found it confusing to have some providers at
>> different level. Examples:
>> - "airbyte" in "providers" directory (I would qualify it as "regular"
>> provider)
>> - "hive" in "providers/apache"
>> - "amazon" in "providers" but which contains only one sub directory "aws"
>> 
>> I would be in favor of using "-" instead of "/" so that all providers are
>> at the same level.
>> 
>> 
>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote:
>>> +1 one to this on general terms, it will hopefully reduce a lot of the
>> boilerplate we need.
>>> 
>>> As for the amazon/aws example specifically that does bring up a
>> question, should we have `/` or `-`.. to give some examples:
>>> 
>>> cncf kubernetes: ./providers/cncf/kubernetes or
>> ./providers/cncf-kubernetes
>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive
>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws
>>> 
>>> There is no requirement from python etc on one form or the other (as
>> it’s just a folder, not part of the module name), so it’s what ever makes
>> most sense to us.
>>> 
>>> Jarek and Dennis (and others): what are your preferences on these styles?
>>> 
>>> -ash
>>> 
>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>> 
>>>> Oh. . And one other benefit of it will be that we will be able to get
>> rid
>>>> of about 40% of the "Providers Manager" code. Currently, in Providers
>>>> manager we have a lot of "ifs" that make it possible to use providers
>> in
>>>> breeze and local environment from the sources. In "production"
>> installation
>>>> we are using "get_provider_info"  entry points to discover providers
>> and
>>>> discover if provider is installed, but when you use current providers
>>>> installed in Breeze to inside "airflow", we rely on `provider.yaml` to
>> be
>>>> present in the "airflow.providers.PROVIDER_ID" path - so we effectively
>>>> have two paths of discovering information about the providers
>> installed.
>>>> 
>>>> After all providers are migrated to the new structure, all providers
>> are
>>>> separate "distributions" - and when you run `uv sync`  (which will
>> install
>>>> all providers thanks to workspace feature) or `pip install -e
>>>> ./providers/aws` (which you will have to do manually to work on the
>>>> provider - if you use `pip` rather than uv) - then we will not have to
>> use
>>>> the separate path to read provider.yaml, because the right entrypoint
>> for
>>>> the provider will be installed as well - so we will be able to get rid
>> of
>>>> quite some code that is currently only used in airflow development
>>>> environment.
>>>> 
>>>> J.
>>>> 
>>>> 
>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>> 
>>>>> Those are very good questions :)
>>>>> 
>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
>>>>> <ferru...@amazon.com.invalid> wrote:
>>>>> 
>>>>>> To clarify that I understand your diagram correctly, let's say you
>> clone
>>>>>> the Airflow repo to ~/workspace/airflow/.  Does this mean that the
>> AWS Glue
>>>>>> Hook which used to live at
>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a random
>>>>>> example) will be located at
>>>>>> 
>> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
>>>>>> That feels unnecessarily repetitive to me, maybe it makes sense but
>> I'm
>>>>>> missing the context?
>>>>>> 
>>>>> 
>>>>> Yes - it means that there is this repetitiveness but for a good
>> reason -
>>>>> those two "amazon/aws" serve different purpose:
>>>>> 
>>>>> * The first "providers/amazon/aws" is just where the whole provider
>>>>> "complete project" is stored - it's basically a directory where "aws
>>>>> provider" is stored, a convenient folder to locate it in, that makes
>> it
>>>>> separate from other providers
>>>>> * The second "src/airflow/providers/amazon/aws" - is the python
>>>>> package where the source files is stored - this is how (inside the
>>>>> sub-folder) you tell the actual python "import" to look for the
>> sources.
>>>>> 
>>>>> .What really matters is that (eventually)
>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a
>> completely
>>>>> separate python project - a source of a "standalone" provider python
>>>>> project.
>>>>> There is a "pyproject.toml" file at the root of it and if you do this
>> (for
>>>>> example):
>>>>> 
>>>>> cd providers/amazon/aws/
>>>>> uv sync
>>>>> 
>>>>> And with it you will be able to work on AWS provider exclusively as a
>>>>> separate project (this is not yet complete with the move - tests are
>> not
>>>>> entirely possible to run today - but it will be possible as next step
>> - I
>>>>> explained it in
>>>>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
>>>>> 
>>>>> This has a number of benefits, but one of them is that you will be
>> able to
>>>>> build provider by just running `build` command of your favourite
>>>>> PEP-standard compliant frontend:
>>>>> 
>>>>> cd providers/amazon/aws/
>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build` )....
>>>>> 
>>>>> This will create  the provider package inside the `dist" folder. I
>> just
>>>>> did it in my PR with `uv` in the first "airbyte` project:
>>>>> 
>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build
>>>>> Building source distribution...
>>>>> Building wheel from source distribution...
>>>>> Successfully built dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
>>>>> Successfully built
>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
>>>>> 
>>>>> That's it. That also allows cases like installing provider packages
>> using
>>>>> git URLs - which I used earlier today to test if the incoming PR of
>>>>> pygments is actually solving the problem we had yesteday
>>>>> https://github.com/apache/airflow/pull/45416  (basically we just
>> make our
>>>>> provider packages "standard" python packages that all the tools
>> support.
>>>>> Anyone who would like to install a commit, hash or branch version of
>> the
>>>>> "airbyte" package from main version of Airflow repo will be able to
>> do:
>>>>> 
>>>>> pip install "apache-airflow-providers-airbyte @ git+
>>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
>>>>> 
>>>>> Currently in order to create the package we need to manually extract
>> the
>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files
>>>>> (pyproject.toml, README.rst and few others) and only then we  build
>> the
>>>>> package. All this - copying file structure, creating new files,
>> running the
>>>>> build command after and finally deleting the copied files is now -
>>>>> dynamically and under-the-hood created by "breeze release-management
>>>>> prepare-provider-packages" command. With this change, the directory
>>>>> structure in `git` repo of ours is totally standard and allows us (and
>>>>> anyone else) to build the package directly from it.
>>>>> 
>>>>> 
>>>>> And what is the plan for system tests?   As part of this
>> reorganization,
>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system?  That
>> seems
>>>>>> more intuitive to me than their current location in
>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
>>>>>> 
>>>>>> 
>>>>> Oh yeah - I missed that in the original structure as the "airbyte"
>>>>> provider (that I chose as first one) did not contain the "system"
>> tests -
>>>>> but one of the two providers after that i was planning to make sure
>> system
>>>>> tests are covered. They are supposed to be moved to "tests/system" of
>>>>> course - Elad had similar question and I also explained it in detail
>> in
>>>>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
>>>>> 
>>>>> 
>>>>> I hope it answers the questions. If not - I am happy to add more
>>>>> clarifications :)
>>>>> 
>>>>> 
>>>>>> J.
>>>>>> 
>>>>> 
>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to