Hey Denis, > tests/providers/airbyte/hooks > tests/providers/airbyte/operators > tests/providers/airbyte/sensors > tests/providers/airbyte/system > tests/providers/airbyte/transfers
Yes. It could be done like that - but it would require merging some packages (there are for example already "system" packages for amazon and google to keep "utils" there. I might attempt to see if this is possible to merge, but this might also be problematic. I want to be able to get the automation in place to make the migration fully automated - so that there is no manual interventions. This will allow people to review the changes, and we can test development environment integration and then re-apply the changes automatically while rebasing. This change is disruptive enough and if we try to "manipulate" with the files moving around too much, it might be too problematic to keep PR merged while reviewing it and making sure the CI is fixed and development environment works - this will take weeks to get right or even months I am afraid, with some experimentation and trying different approaches. As you mentioned - this is an early POC and there is still plenty of work to do on that one. If we find that we cannot do it in this big change, then we can always split it to a separate steps. While we will have all the providers still in the same repository, we can still make "moving system tests" a separate follow-up PR which will be far less disruptive. Hey Everyone. I also think that one deserves an AIP - just the lenght of my email indicates it. I would turn my long email into one if you think it is a good idea, this way we can keep it up-to-date and explain whatever we can come up with. I have a feeling we will get a few "u-turns" while we try different aspects of such separation. WDYT? J. On Tue, Dec 13, 2022 at 3:54 AM Ferruzzi, Dennis <ferru...@amazon.com.invalid> wrote: > > Somehow what you wanted to add here was lost so I am not sure what the > proposal is :). > > Gah, I knew I should have explained it as well. I was basically > suggesting move the system tests for a given provider into that provider's > tests module, alongside hooks, operators, sensors, etc: > > tests/providers/airbyte/hooks > tests/providers/airbyte/operators > tests/providers/airbyte/sensors > tests/providers/airbyte/system > tests/providers/airbyte/transfers > > > > > ------------------------------ > *From:* Jarek Potiuk <ja...@potiuk.com> > *Sent:* Monday, December 12, 2022 5:48 PM > *To:* dev@airflow.apache.org > *Subject:* RE: [EXTERNAL][PROPOSAL] (Internal) move of provider packages > to isolated "providers" sub-folders > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > And just to clarify - there is a little too much redundancy in your > example: > > `airflow/providers/airbyte/hooks/airbyte.py` becomes `airflow > /providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` > > The leading "airflow" is not there in the target path: > > `airflow/providers/airbyte/hooks/airbyte.py` becomes > `providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` > > And the redundancy comes from the fact that we have to somehow name the > directory where the entire provider is moved. Let me rewrite it differently: > > providers/airbyte <- This is where the airbyte provider is moved > | > src/airflow/providers/airbyte/hooks/airbyte.py <- this is > actual path "inside" the provider > > > So there is far less redundancy than you think. > > > On Tue, Dec 13, 2022 at 2:40 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> >>> I have two immediate thoughts on it. First, can this be used as an >>> opportunity to reorganize the system tests tree into the provider's test >>> tree? Instead of having `tests/system/providers/{foo}`, maybe we can move >>> those system tests alongside the other provider tests like this: >>> >> >> Somehow what you wanted to add here was lost so I am not sure what the >> proposal is :). >> >> >>> My other thought is that I think I am missing something here as far as >>> the new organization goes. I haven't spent the time that you have spent >>> looking at how to make this work, and I also realize this is early PoC >>> discussion so maybe this isn't the end target, but the new paths are very >>> long and redundant. Is that for automation reasons? For example in the >>> image you shared: >>> >> >> >>> `airflow/providers/airbyte/hooks/airbyte.py` becomes >>> `airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py` >>> `tests/providers/airbyte/hooks/test_airbyte.py` becomes >>> `airflow/providers/airbyte/ >>> tests/airflow/providers/airbyte/hooks/test_airbyte.py. >>> >> >> For sources, I am quite sure we cannot do it - as this would end up with >> a non-standard package and we would have to make some non-standard >> manipulations with import paths and it would confuse multiple IDEs like >> crazy. >> Our providers are in "airflow.providers.<provider>" package. The "src" is >> the root of sources and we need to have "airflow"/"providers" as the >> package in. >> >> Also this choice was largely based on the official Python packaging >> tutorial: >> https://packaging.python.org/en/latest/tutorials/packaging-projects/ >> (this is where I took the "src" from - it's not been very common in Python >> Projects - including Airflow - but it is now recommended in multiple places >> including official Pypa tutorial: >> >> packaging_tutorial/ >> └── src/ >> └── example_package_YOUR_USERNAME_HERE/ >> ├── __init__.py >> └── example.py >> >> Regarding tests - yes, that could be removed. I even had it like that >> initially. And I am on the fence on this one. >> >> Putting tests at the top level has some drawbacks. One drawback is that >> if we put tests at the top, we miss "namespace". Having a "namespace" makes >> tests unique even if they are defined in different providers and are both >> put on PYTHONPATH >> >> Imagine provider_a defining "tests/hooks/test_utils.py" and provider_b >> defining "tests/utils/test_utils.py". When you import >> "tests.utils.test_utils" - which one do you import if they are all in the >> PYTHONPATH? This will become a problem when we will make all of them added >> to PYTHONPATH in your IDE - I think. I believe many IDEs (including PyCharm >> and VSCode) will not work with multiple, separate python modules and they >> will get confused when seeing multiple repeated packages with the same >> modules on PYTHONPATH. >> <package_name> gives us "namespace" and guarantees that each module will >> be different. >> >> Also keeping them in packages is pretty consistent with some guidelines I >> saw that "tests/<package>" is recommended. I would love however to hear >> more from others experiences (especially TP) as maybe there are other ways >> I have not thought about. I would also prefer "tests/test_mmm.py" rather >> than "tests/airflow/providers/providers/test_mmm.py" - but I am afraid it's >> going to be difficult due to the "repeated modules" problem. >> >> >> >>>