And just to clarify - there is a little too much redundancy in your example:
`airflow/providers/airbyte/hooks/airbyte.py` becomes `airflow /providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` The leading "airflow" is not there in the target path: `airflow/providers/airbyte/hooks/airbyte.py` becomes `providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` And the redundancy comes from the fact that we have to somehow name the directory where the entire provider is moved. Let me rewrite it differently: providers/airbyte <- This is where the airbyte provider is moved | src/airflow/providers/airbyte/hooks/airbyte.py <- this is actual path "inside" the provider So there is far less redundancy than you think. On Tue, Dec 13, 2022 at 2:40 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> I have two immediate thoughts on it. First, can this be used as an >> opportunity to reorganize the system tests tree into the provider's test >> tree? Instead of having `tests/system/providers/{foo}`, maybe we can move >> those system tests alongside the other provider tests like this: >> > > Somehow what you wanted to add here was lost so I am not sure what the > proposal is :). > > >> My other thought is that I think I am missing something here as far as >> the new organization goes. I haven't spent the time that you have spent >> looking at how to make this work, and I also realize this is early PoC >> discussion so maybe this isn't the end target, but the new paths are very >> long and redundant. Is that for automation reasons? For example in the >> image you shared: >> > > >> `airflow/providers/airbyte/hooks/airbyte.py` becomes >> `airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` >> `tests/providers/airbyte/hooks/test_airbyte.py` becomes >> `airflow/providers/airbyte/ >> tests/airflow/providers/airbyte/hooks/test_airbyte.py. >> > > For sources, I am quite sure we cannot do it - as this would end up with a > non-standard package and we would have to make some non-standard > manipulations with import paths and it would confuse multiple IDEs like > crazy. > Our providers are in "airflow.providers.<provider>" package. The "src" is > the root of sources and we need to have "airflow"/"providers" as the > package in. > > Also this choice was largely based on the official Python packaging > tutorial: > https://packaging.python.org/en/latest/tutorials/packaging-projects/ > (this is where I took the "src" from - it's not been very common in Python > Projects - including Airflow - but it is now recommended in multiple places > including official Pypa tutorial: > > packaging_tutorial/ > └── src/ > └── example_package_YOUR_USERNAME_HERE/ > ├── __init__.py > └── example.py > > Regarding tests - yes, that could be removed. I even had it like that > initially. And I am on the fence on this one. > > Putting tests at the top level has some drawbacks. One drawback is that if > we put tests at the top, we miss "namespace". Having a "namespace" makes > tests unique even if they are defined in different providers and are both > put on PYTHONPATH > > Imagine provider_a defining "tests/hooks/test_utils.py" and provider_b > defining "tests/utils/test_utils.py". When you import > "tests.utils.test_utils" - which one do you import if they are all in the > PYTHONPATH? This will become a problem when we will make all of them added > to PYTHONPATH in your IDE - I think. I believe many IDEs (including PyCharm > and VSCode) will not work with multiple, separate python modules and they > will get confused when seeing multiple repeated packages with the same > modules on PYTHONPATH. > <package_name> gives us "namespace" and guarantees that each module will > be different. > > Also keeping them in packages is pretty consistent with some guidelines I > saw that "tests/<package>" is recommended. I would love however to hear > more from others experiences (especially TP) as maybe there are other ways > I have not thought about. I would also prefer "tests/test_mmm.py" rather > than "tests/airflow/providers/providers/test_mmm.py" - but I am afraid it's > going to be difficult due to the "repeated modules" problem. > > > >>