Not strong at all, preference Is all. It sounds like Vincent and I are in the 
hyphen camp and you and Maciej are in the slash camp.

+1 on the “I don’t care what code style is used as long as it is 
programmatically enforced”.

-a

> On 10 Jan 2025, at 09:41, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> Is there anything else that "tastes" Ash ? A concrete reason that makes you
> think the "-" prefix in this case is better than the "/" folder?  How
> strong is your "taste" preference and do you think it will have some
> lasting effect if we choose to flatten the folder structure?
> 
> I might make a small vote to see what is the preference of people if we
> think this is an important aspect.
> 
> BTW. This is why I really love black/ruff formatting - we stopped wasting
> time on "taste" discussion - it does not matter what is the individual
> preference, consistency is more important and I prefer to do stuff that
> really matters but if people feel strongly that we should discuss it, I
> might make a vote there.
> 
> J.
> 
> On Thu, Jan 9, 2025 at 5:48 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> 
>> My preference is for being “more direct” and not having deeply nested
>> things where possible — I think Microsoft might be the one case where
>> having extra folders makes sense. And I’m fine with things not being
>> consistent across providers/groups of providers.
>> 
>> -ash
>> 
>> 
>> 
>>> On 8 Jan 2025, at 17:18, Jarek Potiuk <ja...@potiuk.com> wrote:
>>> 
>>> Can you give an example of what might break why having
>>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
>>> 
>>> Nothing will break. It's just:
>>> 
>>> * the code will have to be a little more complex as it will have to do
>> some
>>> conditional writes of "-" "/"
>>> * there will be inconsistency in the depth of folders - outside it will
>> be
>>> 1, inside it will be 2 (as it is in your example)/
>>> * it will be a bit more convention/ complex to limit related providers
>> (say
>>> microsoft) - with the current scheme "providers/microsoft" is the
>> directory
>>> containing all microsoft providers. If we change it to "-", you have to
>>> find all sub-directories following "microsoft-*" convention.
>>> 
>>> I am not super-strong on it - we could do either, it's just my preference
>>> to use folders for grouping related things (as folders were designed
>> for).
>>> 
>>> J.
>>> 
>>> On Wed, Jan 8, 2025 at 5:03 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>>> 
>>>>> And we already have a number of mappings and conventions to handle
>> that.
>>>>> For example provider I'd mapping to dirs (apache.beam -> apache/beam),
>>>> and
>>>>> 'apache-airflow-providers-apache-beam' as package na e  and
>>>>> airflow/providers/apache/beam as packages inside the distribution.
>> Those
>>>>> will remain as they are - we cannot change them without breaking
>>>>> compatibility.
>>>> 
>>>> Can you give an example of what might break why having
>>>> `providers/aapche-beam/src/airflow/providers/apache/beam`?
>>>> 
>>>> -a
>>>> 
>>>>> On 7 Jan 2025, at 18:33, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>> 
>>>>> I think it will be better to keep it.
>>>>> 
>>>>> The reason we have varying levels were to group things together -
>> mainly
>>>>> Apache related providers, but also Microsoft.
>>>>> 
>>>>> And we already have a number of mappings and conventions to handle
>> that.
>>>>> For example provider I'd mapping to dirs (apache.beam -> apache/beam),
>>>> and
>>>>> 'apache-airflow-providers-apache-beam' as package na e  and
>>>>> airflow/providers/apache/beam as packages inside the distribution.
>> Those
>>>>> will remain as they are - we cannot change them without breaking
>>>>> compatibility.
>>>>> 
>>>>> So if we change it to a flat structure we will have some
>> inconsistencies
>>>> -
>>>>> in some cases it will be single folder in others (packages) those will
>> be
>>>>> two folders.
>>>>> 
>>>>> I think it will be more harm than good if we get rid of the 'folder'
>>>>> structures - some of the code in breeze will have to treat those
>>>>> differently as well. Nothing extraordinary and very complex but more
>>>>> complex-ish than it should be - already on top of handling potentially
>>>>> nested folders
>>>>> 
>>>>> So my preference would be to stay with apache/beam - it's just more
>>>>> consistently handling the case where provider packages can be one-level
>>>>> nested
>>>>> 
>>>>> J
>>>>> 
>>>>> 
>>>>> wt., 7 sty 2025, 19:00 użytkownik Vincent Beck <vincb...@apache.org>
>>>>> napisał:
>>>>> 
>>>>>> Good question. I always found it confusing to have some providers at
>>>>>> different level. Examples:
>>>>>> - "airbyte" in "providers" directory (I would qualify it as "regular"
>>>>>> provider)
>>>>>> - "hive" in "providers/apache"
>>>>>> - "amazon" in "providers" but which contains only one sub directory
>>>> "aws"
>>>>>> 
>>>>>> I would be in favor of using "-" instead of "/" so that all providers
>>>> are
>>>>>> at the same level.
>>>>>> 
>>>>>> 
>>>>>> On 2025/01/07 16:38:10 Ash Berlin-Taylor wrote:
>>>>>>> +1 one to this on general terms, it will hopefully reduce a lot of
>> the
>>>>>> boilerplate we need.
>>>>>>> 
>>>>>>> As for the amazon/aws example specifically that does bring up a
>>>>>> question, should we have `/` or `-`.. to give some examples:
>>>>>>> 
>>>>>>> cncf kubernetes: ./providers/cncf/kubernetes or
>>>>>> ./providers/cncf-kubernetes
>>>>>>> Apache hive: ./providers/apache/hive or ./providers/apache-hive
>>>>>>> AWS: ./providers/amazon/aws or ./providers/amazon-aws
>>>>>>> 
>>>>>>> There is no requirement from python etc on one form or the other (as
>>>>>> it’s just a folder, not part of the module name), so it’s what ever
>>>> makes
>>>>>> most sense to us.
>>>>>>> 
>>>>>>> Jarek and Dennis (and others): what are your preferences on these
>>>> styles?
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 6 Jan 2025, at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>>>> 
>>>>>>>> Oh. . And one other benefit of it will be that we will be able to
>> get
>>>>>> rid
>>>>>>>> of about 40% of the "Providers Manager" code. Currently, in
>> Providers
>>>>>>>> manager we have a lot of "ifs" that make it possible to use
>> providers
>>>>>> in
>>>>>>>> breeze and local environment from the sources. In "production"
>>>>>> installation
>>>>>>>> we are using "get_provider_info"  entry points to discover providers
>>>>>> and
>>>>>>>> discover if provider is installed, but when you use current
>> providers
>>>>>>>> installed in Breeze to inside "airflow", we rely on `provider.yaml`
>> to
>>>>>> be
>>>>>>>> present in the "airflow.providers.PROVIDER_ID" path - so we
>>>> effectively
>>>>>>>> have two paths of discovering information about the providers
>>>>>> installed.
>>>>>>>> 
>>>>>>>> After all providers are migrated to the new structure, all providers
>>>>>> are
>>>>>>>> separate "distributions" - and when you run `uv sync`  (which will
>>>>>> install
>>>>>>>> all providers thanks to workspace feature) or `pip install -e
>>>>>>>> ./providers/aws` (which you will have to do manually to work on the
>>>>>>>> provider - if you use `pip` rather than uv) - then we will not have
>> to
>>>>>> use
>>>>>>>> the separate path to read provider.yaml, because the right
>> entrypoint
>>>>>> for
>>>>>>>> the provider will be installed as well - so we will be able to get
>> rid
>>>>>> of
>>>>>>>> quite some code that is currently only used in airflow development
>>>>>>>> environment.
>>>>>>>> 
>>>>>>>> J.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jan 6, 2025 at 11:41 PM Jarek Potiuk <ja...@potiuk.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Those are very good questions :)
>>>>>>>>> 
>>>>>>>>> On Mon, Jan 6, 2025 at 10:54 PM Ferruzzi, Dennis
>>>>>>>>> <ferru...@amazon.com.invalid> wrote:
>>>>>>>>> 
>>>>>>>>>> To clarify that I understand your diagram correctly, let's say you
>>>>>> clone
>>>>>>>>>> the Airflow repo to ~/workspace/airflow/.  Does this mean that the
>>>>>> AWS Glue
>>>>>>>>>> Hook which used to live at
>>>>>>>>>> ~/workspace/airflow/providers/amazon/aws/hooks/glue.py (as a
>> random
>>>>>>>>>> example) will be located at
>>>>>>>>>> 
>>>>>> 
>>>> 
>> ~/workspace/airflow/providers/amazon/aws/src/airflow/providers/amazon/aws/hooks/glue.py?
>>>>>>>>>> That feels unnecessarily repetitive to me, maybe it makes sense
>> but
>>>>>> I'm
>>>>>>>>>> missing the context?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Yes - it means that there is this repetitiveness but for a good
>>>>>> reason -
>>>>>>>>> those two "amazon/aws" serve different purpose:
>>>>>>>>> 
>>>>>>>>> * The first "providers/amazon/aws" is just where the whole provider
>>>>>>>>> "complete project" is stored - it's basically a directory where
>> "aws
>>>>>>>>> provider" is stored, a convenient folder to locate it in, that
>> makes
>>>>>> it
>>>>>>>>> separate from other providers
>>>>>>>>> * The second "src/airflow/providers/amazon/aws" - is the python
>>>>>>>>> package where the source files is stored - this is how (inside the
>>>>>>>>> sub-folder) you tell the actual python "import" to look for the
>>>>>> sources.
>>>>>>>>> 
>>>>>>>>> .What really matters is that (eventually)
>>>>>>>>> `~/workspace/airflow/providers/amazon/aws/` can be treated as a
>>>>>> completely
>>>>>>>>> separate python project - a source of a "standalone" provider
>> python
>>>>>>>>> project.
>>>>>>>>> There is a "pyproject.toml" file at the root of it and if you do
>> this
>>>>>> (for
>>>>>>>>> example):
>>>>>>>>> 
>>>>>>>>> cd providers/amazon/aws/
>>>>>>>>> uv sync
>>>>>>>>> 
>>>>>>>>> And with it you will be able to work on AWS provider exclusively
>> as a
>>>>>>>>> separate project (this is not yet complete with the move - tests
>> are
>>>>>> not
>>>>>>>>> entirely possible to run today - but it will be possible as next
>> step
>>>>>> - I
>>>>>>>>> explained it in
>>>>>>>>> 
>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
>>>>>>>>> 
>>>>>>>>> This has a number of benefits, but one of them is that you will be
>>>>>> able to
>>>>>>>>> build provider by just running `build` command of your favourite
>>>>>>>>> PEP-standard compliant frontend:
>>>>>>>>> 
>>>>>>>>> cd providers/amazon/aws/
>>>>>>>>> `uv build` (or `hatch build` or `poetry build` or `flit build`
>> )....
>>>>>>>>> 
>>>>>>>>> This will create  the provider package inside the `dist" folder. I
>>>>>> just
>>>>>>>>> did it in my PR with `uv` in the first "airbyte` project:
>>>>>>>>> 
>>>>>>>>> root@d74b3136d62f:/opt/airflow/providers/airbyte# uv build
>>>>>>>>> Building source distribution...
>>>>>>>>> Building wheel from source distribution...
>>>>>>>>> Successfully built
>> dist/apache_airflow_providers_airbyte-5.0.0.tar.gz
>>>>>>>>> Successfully built
>>>>>>>>> dist/apache_airflow_providers_airbyte-5.0.0-py3-none-any.whl
>>>>>>>>> 
>>>>>>>>> That's it. That also allows cases like installing provider packages
>>>>>> using
>>>>>>>>> git URLs - which I used earlier today to test if the incoming PR of
>>>>>>>>> pygments is actually solving the problem we had yesteday
>>>>>>>>> https://github.com/apache/airflow/pull/45416  (basically we just
>>>>>> make our
>>>>>>>>> provider packages "standard" python packages that all the tools
>>>>>> support.
>>>>>>>>> Anyone who would like to install a commit, hash or branch version
>> of
>>>>>> the
>>>>>>>>> "airbyte" package from main version of Airflow repo will be able to
>>>>>> do:
>>>>>>>>> 
>>>>>>>>> pip install "apache-airflow-providers-airbyte @ git+
>>>>>>>>> https://github.com/apache/airflow.git/providers/airbyte@COMMIT_ID";
>>>>>>>>> 
>>>>>>>>> Currently in order to create the package we need to manually
>> extract
>>>>>> the
>>>>>>>>> "amazon" subtree, copy it elsewhere, prepare dynamically some files
>>>>>>>>> (pyproject.toml, README.rst and few others) and only then we  build
>>>>>> the
>>>>>>>>> package. All this - copying file structure, creating new files,
>>>>>> running the
>>>>>>>>> build command after and finally deleting the copied files is now -
>>>>>>>>> dynamically and under-the-hood created by "breeze
>> release-management
>>>>>>>>> prepare-provider-packages" command. With this change, the directory
>>>>>>>>> structure in `git` repo of ours is totally standard and allows us
>>>> (and
>>>>>>>>> anyone else) to build the package directly from it.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> And what is the plan for system tests?   As part of this
>>>>>> reorganization,
>>>>>>>>>> could they be moved into providers/{PROVIDER_ID}/tests/system?
>> That
>>>>>> seems
>>>>>>>>>> more intuitive to me than their current location in
>>>>>>>>>> providers/tests/system/{PROVIDER_ID}/example_foo.py.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> Oh yeah - I missed that in the original structure as the "airbyte"
>>>>>>>>> provider (that I chose as first one) did not contain the "system"
>>>>>> tests -
>>>>>>>>> but one of the two providers after that i was planning to make sure
>>>>>> system
>>>>>>>>> tests are covered. They are supposed to be moved to "tests/system"
>> of
>>>>>>>>> course - Elad had similar question and I also explained it in
>> detail
>>>>>> in
>>>>>>>>> 
>> https://github.com/apache/airflow/pull/45259#issuecomment-2572427916
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I hope it answers the questions. If not - I am happy to add more
>>>>>>>>> clarifications :)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> J.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to