My preference is for being “more direct” and not having deeply nested things 
where possible — I think Microsoft might be the one case where having extra 
folders makes sense. And I’m fine with things not being consistent across 
providers/groups of providers.

-ash



> On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> Can you give an example of what might break why having
> `providers/aapche-beam/src/airflow/providers/apache/beam`?
> 
> Nothing will break. It's just:
> 
> * the code will have to be a little more complex as it will have to do some
> conditional writes of "-" "/"
> * there will be inconsistency in the depth of folders - outside it will be
> 1, inside it will be 2 (as it is in your example)/
> * it will be a bit more convention/ complex to limit related providers (say
> microsoft) - with the current scheme "providers/microsoft" is the directory
> containing all microsoft providers. If we change it to "-", you have to
> find all sub-directories following "microsoft-*" convention.
> 
> I am not super-strong on it - we could do either, it's just my preference
> to use folders for grouping related things (as folders were designed for).
> 
> J.
> 
> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> 
>>> And we already have a number of mappings and conventions to handle that.
>>> For example provider I'd mapping to dirs (apache.beam -> apache/beam),
>> and
>>> 'apache-airflow-providers-apache-beam' as package na e  and
>>> airflow/providers/apache/beam as packages inside the distribution. Those
>>> will remain as they are - we cannot change them without breaking
>>> compatibility.
>> 
>> Can you give an example of what might break why having
>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
>> 
>> -a
>> 
>>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote:
>>> 
>>> I think it will be better to keep it.
>>> 
>>> The reason we have varying levels were to group things together - mainly
>>> Apache related providers, but also Microsoft.
>>> 
>>> And we already have a number of mappings and conventions to handle that.
>>> For example provider I'd mapping to dirs (apache.beam -> apache/beam),
>> and
>>> 'apache-airflow-providers-apache-beam' as package na e  and
>>> airflow/providers/apache/beam as packages inside the distribution. Those
>>> will remain as they are - we cannot change them without breaking
>>> compatibility.
>>> 
>>> So if we change it to a flat structure we will have some inconsistencies
>> -
>>> in some cases it will be single folder in others (packages) those will be
>>> two folders.
>>> 
>>> I think it will be more harm than good if we get rid of the 'folder'
>>> structures - some of the code in breeze will have to treat those
>>> differently as well. Nothing extraordinary and very complex but more
>>> complex-ish than it should be - already on top of handling potentially
>>> nested folders
>>> 
>>> So my preference would be to stay with apache/beam - it's just more
>>> consistently handling the case where provider packages can be one-level
>>> nested
>>> 
>>> J
>>> 
>>> 
>>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org>
>>> napisał:
>>> 
>>>> Good question. I always found it confusing to have some providers at
>>>> different level. Examples:
>>>> - "airbyte" in "providers" directory (I would qualify it as "regular"
>>>> provider)
>>>> - "hive" in "providers/apache"
>>>> - "amazon" in "providers" but which contains only one sub directory
>> "aws"
>>>> 
>>>> I would be in favor of using "-" instead of "/" so that all providers
>> are
>>>> at the same level.
>>>> 
>>>> 
>>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote:
>>>>> +1 one to this on general terms, it will hopefully reduce a lot of the
>>>> boilerplate we need.
>>>>> 
>>>>> As for the amazon/aws example specifically that does bring up a
>>>> question, should we have `/` or `-`.. to give some examples:
>>>>> 
>>>>> cncf kubernetes: ./providers/cncf/kubernetes or
>>>> ./providers/cncf-kubernetes
>>>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive
>>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws
>>>>> 
>>>>> There is no requirement from python etc on one form or the other (as
>>>> it’s just a folder, not part of the module name), so it’s what ever
>> makes
>>>> most sense to us.
>>>>> 
>>>>> Jarek and Dennis (and others): what are your preferences on these
>> styles?
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>> 
>>>>>> Oh. . And one other benefit of it will be that we will be able to get
>>>> rid
>>>>>> of about 40% of the "Providers Manager" code. Currently, in Providers
>>>>>> manager we have a lot of "ifs" that make it possible to use providers
>>>> in
>>>>>> breeze and local environment from the sources. In "production"
>>>> installation
>>>>>> we are using "get_provider_info"  entry points to discover providers
>>>> and
>>>>>> discover if provider is installed, but when you use current providers
>>>>>> installed in Breeze to inside "airflow", we rely on `provider.yaml` to
>>>> be
>>>>>> present in the "airflow.providers.PROVIDER_ID" path - so we
>> effectively
>>>>>> have two paths of discovering information about the providers
>>>> installed.
>>>>>> 
>>>>>> After all providers are migrated to the new structure, all providers
>>>> are
>>>>>> separate "distributions" - and when you run `uv sync`  (which will
>>>> install
>>>>>> all providers thanks to workspace feature) or `pip install -e
>>>>>> ./providers/aws` (which you will have to do manually to work on the
>>>>>> provider - if you use `pip` rather than uv) - then we will not have to
>>>> use
>>>>>> the separate path to read provider.yaml, because the right entrypoint
>>>> for
>>>>>> the provider will be installed as well - so we will be able to get rid
>>>> of
>>>>>> quite some code that is currently only used in airflow development
>>>>>> environment.
>>>>>> 
>>>>>> J.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>>>>>> 
>>>>>>> Those are very good questions :)
>>>>>>> 
>>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
>>>>>>> <ferru...@amazon.com.invalid> wrote:
>>>>>>> 
>>>>>>>> To clarify that I understand your diagram correctly, let's say you
>>>> clone
>>>>>>>> the Airflow repo to ~/workspace/airflow/.  Does this mean that the
>>>> AWS Glue
>>>>>>>> Hook which used to live at
>>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a random
>>>>>>>> example) will be located at
>>>>>>>> 
>>>> 
>> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
>>>>>>>> That feels unnecessarily repetitive to me, maybe it makes sense but
>>>> I'm
>>>>>>>> missing the context?
>>>>>>>> 
>>>>>>> 
>>>>>>> Yes - it means that there is this repetitiveness but for a good
>>>> reason -
>>>>>>> those two "amazon/aws" serve different purpose:
>>>>>>> 
>>>>>>> * The first "providers/amazon/aws" is just where the whole provider
>>>>>>> "complete project" is stored - it's basically a directory where "aws
>>>>>>> provider" is stored, a convenient folder to locate it in, that makes
>>>> it
>>>>>>> separate from other providers
>>>>>>> * The second "src/airflow/providers/amazon/aws" - is the python
>>>>>>> package where the source files is stored - this is how (inside the
>>>>>>> sub-folder) you tell the actual python "import" to look for the
>>>> sources.
>>>>>>> 
>>>>>>> .What really matters is that (eventually)
>>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a
>>>> completely
>>>>>>> separate python project - a source of a "standalone" provider python
>>>>>>> project.
>>>>>>> There is a "pyproject.toml" file at the root of it and if you do this
>>>> (for
>>>>>>> example):
>>>>>>> 
>>>>>>> cd providers/amazon/aws/
>>>>>>> uv sync
>>>>>>> 
>>>>>>> And with it you will be able to work on AWS provider exclusively as a
>>>>>>> separate project (this is not yet complete with the move - tests are
>>>> not
>>>>>>> entirely possible to run today - but it will be possible as next step
>>>> - I
>>>>>>> explained it in
>>>>>>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
>>>>>>> 
>>>>>>> This has a number of benefits, but one of them is that you will be
>>>> able to
>>>>>>> build provider by just running `build` command of your favourite
>>>>>>> PEP-standard compliant frontend:
>>>>>>> 
>>>>>>> cd providers/amazon/aws/
>>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build` )....
>>>>>>> 
>>>>>>> This will create  the provider package inside the `dist" folder. I
>>>> just
>>>>>>> did it in my PR with `uv` in the first "airbyte` project:
>>>>>>> 
>>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build
>>>>>>> Building source distribution...
>>>>>>> Building wheel from source distribution...
>>>>>>> Successfully built dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
>>>>>>> Successfully built
>>>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
>>>>>>> 
>>>>>>> That's it. That also allows cases like installing provider packages
>>>> using
>>>>>>> git URLs - which I used earlier today to test if the incoming PR of
>>>>>>> pygments is actually solving the problem we had yesteday
>>>>>>> https://github.com/apache/airflow/pull/45416  (basically we just
>>>> make our
>>>>>>> provider packages "standard" python packages that all the tools
>>>> support.
>>>>>>> Anyone who would like to install a commit, hash or branch version of
>>>> the
>>>>>>> "airbyte" package from main version of Airflow repo will be able to
>>>> do:
>>>>>>> 
>>>>>>> pip install "apache-airflow-providers-airbyte @ git+
>>>>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
>>>>>>> 
>>>>>>> Currently in order to create the package we need to manually extract
>>>> the
>>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files
>>>>>>> (pyproject.toml, README.rst and few others) and only then we  build
>>>> the
>>>>>>> package. All this - copying file structure, creating new files,
>>>> running the
>>>>>>> build command after and finally deleting the copied files is now -
>>>>>>> dynamically and under-the-hood created by "breeze release-management
>>>>>>> prepare-provider-packages" command. With this change, the directory
>>>>>>> structure in `git` repo of ours is totally standard and allows us
>> (and
>>>>>>> anyone else) to build the package directly from it.
>>>>>>> 
>>>>>>> 
>>>>>>> And what is the plan for system tests?   As part of this
>>>> reorganization,
>>>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system?  That
>>>> seems
>>>>>>>> more intuitive to me than their current location in
>>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
>>>>>>>> 
>>>>>>>> 
>>>>>>> Oh yeah - I missed that in the original structure as the "airbyte"
>>>>>>> provider (that I chose as first one) did not contain the "system"
>>>> tests -
>>>>>>> but one of the two providers after that i was planning to make sure
>>>> system
>>>>>>> tests are covered. They are supposed to be moved to "tests/system" of
>>>>>>> course - Elad had similar question and I also explained it in detail
>>>> in
>>>>>>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
>>>>>>> 
>>>>>>> 
>>>>>>> I hope it answers the questions. If not - I am happy to add more
>>>>>>> clarifications :)
>>>>>>> 
>>>>>>> 
>>>>>>>> J.
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to