>  Somehow what you wanted to add here was lost so I am not sure what the 
> proposal is :).


Gah,  I knew I should have explained it as well.  I was basically suggesting 
move the system tests for a given provider into that provider's tests module, 
alongside hooks, operators, sensors, etc:


tests/providers/airbyte/hooks
tests/providers/airbyte/operators
tests/providers/airbyte/sensors
tests/providers/airbyte/system
tests/providers/airbyte/transfers




________________________________
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Monday, December 12, 2022 5:48 PM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL][PROPOSAL] (Internal) move of provider packages to 
isolated "providers" sub-folders


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


And just to clarify - there is a little too much redundancy in your example:

`airflow/providers/airbyte/hooks/airbyte.py` becomes 
`airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`

The leading "airflow" is not there in the target path:

`airflow/providers/airbyte/hooks/airbyte.py` becomes 
`providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`

And the redundancy comes from the fact that we have to somehow name the 
directory where the entire provider is moved. Let me rewrite it differently:

providers/airbyte <- This is where the airbyte provider is moved
                |
                src/airflow/providers/airbyte/hooks/airbyte.py <- this is 
actual path "inside" the provider


So there is far less redundancy than you think.


On Tue, Dec 13, 2022 at 2:40 AM Jarek Potiuk 
<ja...@potiuk.com<mailto:ja...@potiuk.com>> wrote:

I have two immediate thoughts on it.   First, can this be used as an 
opportunity to reorganize the system tests tree into the provider's test tree?  
Instead of having `tests/system/providers/{foo}`, maybe we can move those 
system tests alongside the other provider tests like this:

Somehow what you wanted to add here was lost so I am not sure what the proposal 
is :).


My other thought is that I think I am missing something here as far as the new 
organization goes.  I haven't spent the time that you have spent looking at how 
to make this work, and I also realize this is early PoC discussion so maybe 
this isn't the end target, but the new paths are very long and redundant.  Is 
that for automation reasons?  For example in the image you shared:



`airflow/providers/airbyte/hooks/airbyte.py` becomes 
`airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`
`tests/providers/airbyte/hooks/test_airbyte.py` becomes 
`airflow/providers/airbyte/tests/airflow/providers/airbyte/hooks/test_airbyte.py.

For sources, I am quite sure we cannot do it - as this would end up with a 
non-standard package and we would have to make some non-standard manipulations 
with import paths and it would confuse multiple IDEs like crazy.
Our providers are in "airflow.providers.<provider>" package. The "src" is the 
root of sources and we need to have "airflow"/"providers" as the package in.

Also this choice was largely based on the official Python packaging tutorial: 
https://packaging.python.org/en/latest/tutorials/packaging-projects/ (this is 
where I took the "src" from - it's not been very common in Python Projects - 
including Airflow - but it is now recommended in multiple places including 
official Pypa tutorial:

packaging_tutorial/
└── src/
    └── example_package_YOUR_USERNAME_HERE/
        ├── __init__.py
        └── example.py

Regarding tests - yes, that could be removed. I even had it like that 
initially. And I am on the fence on this one.

Putting tests at the top level has some drawbacks. One drawback is that if we 
put tests at the top, we miss "namespace". Having a "namespace" makes tests 
unique even if they are defined in different providers and are both put on 
PYTHONPATH

Imagine provider_a defining "tests/hooks/test_utils.py" and provider_b defining 
"tests/utils/test_utils.py". When you import "tests.utils.test_utils" - which 
one do you import if they are all in the PYTHONPATH? This will become a problem 
when we will make all of them added to PYTHONPATH in your IDE - I think. I 
believe many IDEs (including PyCharm and VSCode) will not work with multiple, 
separate python modules and they will get confused when seeing multiple 
repeated packages with the same modules on PYTHONPATH.
<package_name> gives us "namespace" and guarantees that each module will be 
different.

Also keeping them in packages is pretty consistent with some guidelines I saw 
that "tests/<package>" is recommended. I would love however to hear more from 
others experiences (especially TP) as maybe there are other ways I have not 
thought about. I would also prefer "tests/test_mmm.py" rather than 
"tests/airflow/providers/providers/test_mmm.py" - but I am afraid it's going to 
be difficult due to the "repeated modules" problem.



Reply via email to