+7000
This sort of restructure has been on the idea list in the back of my head for at least a year if not two, but didn't ever crystlaize enough to do form it in to a concrete proposal. Love the prposed file structure, it was exactly what I had in mind. While it is verbose of having the path in src and test I like it as it makes it clear what the test is covering, or the inverse, if I'm looking at src/a/b.py, I know to expect the tests in tests/a/test_b.py. Personal preference and I wouldn't object if people felt strongly about reducing the depth in test. As for choice of builder one thing I have seen mentioned is https://pypi.org/project/hatch/ and https://pypi.org/project/hatch-fancy-pypi-readme/ as a nice sweetener to go with that. I have not used hatch or flit myself. (I have used Poetry, and while I like it I don't think it would be right for Airflow as it doesn't match some of our expectations.) AIP or not I don't actually mind on this one. It's a big change in terms of files, but not that big in terms of code or architecture of Airflow as a system. I think I lean slightly towards vote but no AIP needed. -ash On Dec 14 2022, at 11:46 am, Jarek Potiuk <ja...@potiuk.com> wrote: > Hey Denis, > > > tests/providers/airbyte/hooks > > tests/providers/airbyte/operators > > tests/providers/airbyte/sensors > > > tests/providers/airbyte/system > > tests/providers/airbyte/transfers > > Yes. It could be done like that - but it would require merging some packages > (there are for example already "system" packages for amazon and google to > keep "utils" there. > > I might attempt to see if this is possible to merge, but this might also be > problematic. I want to be able to get the automation in place to make the > migration fully automated - so that there is no manual interventions. This > will allow people to review the changes, and we can test development > environment integration and then re-apply the changes automatically while > rebasing. > > This change is disruptive enough and if we try to "manipulate" with the files > moving around too much, it might be too problematic to keep PR merged while > reviewing it and making sure the CI is fixed and development environment > works - this will take weeks to get right or even months I am afraid, with > some experimentation and trying different approaches. As you mentioned - this > is an early POC and there is still plenty of work to do on that one. If we > find that we cannot do it in this big change, then we can always split it to > a separate steps. While we will have all the providers still in the same > repository, we can still make "moving system tests" a separate follow-up PR > which will be far less disruptive. > > Hey Everyone. > > I also think that one deserves an AIP - just the lenght of my email indicates > it. I would turn my long email into one if you think it is a good idea, this > way we can keep it up-to-date and explain whatever we can come up with. I > have a feeling we will get a few "u-turns" while we try different aspects of > such separation. > > WDYT? > > J. > > > > On Tue, Dec 13, 2022 at 3:54 AM Ferruzzi, Dennis > <ferru...@amazon.com.invalid> wrote: > > > Somehow what you wanted to add here was lost so I am not sure what the > > > proposal is :). > > > > > > Gah, I knew I should have explained it as well. I was basically suggesting > > move the system tests for a given provider into that provider's tests > > module, alongside hooks, operators, sensors, etc: > > > > tests/providers/airbyte/hooks > > tests/providers/airbyte/operators > > tests/providers/airbyte/sensors > > > > tests/providers/airbyte/system > > tests/providers/airbyte/transfers > > > > > > > > > > > > From: Jarek Potiuk <ja...@potiuk.com (mailto:ja...@potiuk.com)> > > Sent: Monday, December 12, 2022 5:48 PM > > To: dev@airflow.apache.org (mailto:dev@airflow.apache.org) > > Subject: RE: [EXTERNAL][PROPOSAL] (Internal) move of provider packages to > > isolated "providers" sub-folders > > > > > > > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and know > > the content is safe. > > > > > > > > > > > > > > > > > > And just to clarify - there is a little too much redundancy in your example: > > > > `airflow/providers/airbyte/hooks/airbyte.py` becomes > > `airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` > > > > The leading "airflow" is not there in the target path: > > > > `airflow/providers/airbyte/hooks/airbyte.py` becomes > > `providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` > > > > And the redundancy comes from the fact that we have to somehow name the > > directory where the entire provider is moved. Let me rewrite it differently: > > > > providers/airbyte <- This is where the airbyte provider is moved > > | > > src/airflow/providers/airbyte/hooks/airbyte.py <- this is actual path > > "inside" the provider > > > > > > > > So there is far less redundancy than you think. > > > > > > On Tue, Dec 13, 2022 at 2:40 AM Jarek Potiuk <ja...@potiuk.com > > (mailto:ja...@potiuk.com)> wrote: > > > > > > > > I have two immediate thoughts on it. First, can this be used as an > > > > opportunity to reorganize the system tests tree into the provider's > > > > test tree? Instead of having `tests/system/providers/{foo}`, maybe we > > > > can move those system tests alongside the other provider tests like > > > > this: > > > > > > Somehow what you wanted to add here was lost so I am not sure what the > > > proposal is :). > > > > > > > My other thought is that I think I am missing something here as far as > > > > the new organization goes. I haven't spent the time that you have spent > > > > looking at how to make this work, and I also realize this is early PoC > > > > discussion so maybe this isn't the end target, but the new paths are > > > > very long and redundant. Is that for automation reasons? For example in > > > > the image you shared: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > `airflow/providers/airbyte/hooks/airbyte.py` becomes > > > > `airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` > > > > `tests/providers/airbyte/hooks/test_airbyte.py` becomes > > > > `airflow/providers/airbyte/tests/airflow/providers/airbyte/hooks/test_airbyte.py. > > > > > > > > > > > > > > > > > > > > > > > > > For sources, I am quite sure we cannot do it - as this would end up with > > > a non-standard package and we would have to make some non-standard > > > manipulations with import paths and it would confuse multiple IDEs like > > > crazy. > > > Our providers are in "airflow.providers.<provider>" package. The "src" is > > > the root of sources and we need to have "airflow"/"providers" as the > > > package in. > > > > > > Also this choice was largely based on the official Python packaging > > > tutorial: > > > https://packaging.python.org/en/latest/tutorials/packaging-projects/ > > > (this is where I took the "src" from - it's not been very common in > > > Python Projects - including Airflow - but it is now recommended in > > > multiple places including official Pypa tutorial: > > > packaging_tutorial/ > > > └── src/ > > > └── example_package_YOUR_USERNAME_HERE/ > > > ├── __init__.py > > > └── example.py > > > > > > > > > Regarding tests - yes, that could be removed. I even had it like that > > > initially. And I am on the fence on this one. > > > > > > Putting tests at the top level has some drawbacks. One drawback is that > > > if we put tests at the top, we miss "namespace". Having a "namespace" > > > makes tests unique even if they are defined in different providers and > > > are both put on PYTHONPATH > > > > > > Imagine provider_a defining "tests/hooks/test_utils.py" and provider_b > > > defining "tests/utils/test_utils.py". When you import > > > "tests.utils.test_utils" - which one do you import if they are all in the > > > PYTHONPATH? This will become a problem when we will make all of them > > > added to PYTHONPATH in your IDE - I think. I believe many IDEs (including > > > PyCharm and VSCode) will not work with multiple, separate python modules > > > and they will get confused when seeing multiple repeated packages with > > > the same modules on PYTHONPATH. > > > <package_name> gives us "namespace" and guarantees that each module will > > > be different. > > > > > > Also keeping them in packages is pretty consistent with some guidelines I > > > saw that "tests/<package>" is recommended. I would love however to hear > > > more from others experiences (especially TP) as maybe there are other > > > ways I have not thought about. I would also prefer "tests/test_mmm.py" > > > rather than "tests/airflow/providers/providers/test_mmm.py" - but I am > > > afraid it's going to be difficult due to the "repeated modules" problem. > > > > > >