+7000

This sort of restructure has been on the idea list in the back of my head for 
at least a year if not two, but didn't ever crystlaize enough to do form it in 
to a concrete proposal.
Love the prposed file structure, it was exactly what I had in mind. While it is 
verbose of having the path in src and test I like it as it makes it clear what 
the test is covering, or the inverse, if I'm looking at src/a/b.py, I know to 
expect the tests in tests/a/test_b.py. Personal preference and I wouldn't 
object if people felt strongly about reducing the depth in test.
As for choice of builder one thing I have seen mentioned is 
https://pypi.org/project/hatch/ and 
https://pypi.org/project/hatch-fancy-pypi-readme/ as a nice sweetener to go 
with that. I have not used hatch or flit myself. (I have used Poetry, and while 
I like it I don't think it would be right for Airflow as it doesn't match some 
of our expectations.)
AIP or not I don't actually mind on this one. It's a big change in terms of 
files, but not that big in terms of code or architecture of Airflow as a 
system. I think I lean slightly towards vote but no AIP needed.
-ash
On Dec 14 2022, at 11:46 am, Jarek Potiuk <ja...@potiuk.com> wrote:
> Hey Denis,
>
> > tests/providers/airbyte/hooks
> > tests/providers/airbyte/operators
> > tests/providers/airbyte/sensors
>
> > tests/providers/airbyte/system
> > tests/providers/airbyte/transfers
>
> Yes. It could be done like that - but it would require merging some packages 
> (there are for example already "system" packages for amazon and google to 
> keep "utils" there.
>
> I might attempt to see if this is possible to merge, but this might also be 
> problematic. I want to be able to get the automation in place to make the 
> migration fully automated - so that there is no manual interventions. This 
> will allow people to review the changes, and we can test development 
> environment integration and then re-apply the changes automatically while 
> rebasing.
>
> This change is disruptive enough and if we try to "manipulate" with the files 
> moving around too much, it might be too problematic to keep PR merged while 
> reviewing it and making sure the CI is fixed and development environment 
> works - this will take weeks to get right or even months I am afraid, with 
> some experimentation and trying different approaches. As you mentioned - this 
> is an early POC and there is still plenty of work to do on that one. If we 
> find that we cannot do it in this big change, then we can always split it to 
> a separate steps. While we will have all the providers still in the same 
> repository, we can still make "moving system tests" a separate follow-up PR 
> which will be far less disruptive.
>
> Hey Everyone.
>
> I also think that one deserves an AIP - just the lenght of my email indicates 
> it. I would turn my long email into one if you think it is a good idea, this 
> way we can keep it up-to-date and explain whatever we can come up with. I 
> have a feeling we will get a few "u-turns" while we try different aspects of 
> such separation.
>
> WDYT?
>
> J.
>
>
>
> On Tue, Dec 13, 2022 at 3:54 AM Ferruzzi, Dennis 
> <ferru...@amazon.com.invalid> wrote:
> > > Somehow what you wanted to add here was lost so I am not sure what the 
> > > proposal is :).
> >
> >
> > Gah, I knew I should have explained it as well. I was basically suggesting 
> > move the system tests for a given provider into that provider's tests 
> > module, alongside hooks, operators, sensors, etc:
> >
> > tests/providers/airbyte/hooks
> > tests/providers/airbyte/operators
> > tests/providers/airbyte/sensors
> >
> > tests/providers/airbyte/system
> > tests/providers/airbyte/transfers
> >
> >
> >
> >
> >
> > From: Jarek Potiuk <ja...@potiuk.com (mailto:ja...@potiuk.com)>
> > Sent: Monday, December 12, 2022 5:48 PM
> > To: dev@airflow.apache.org (mailto:dev@airflow.apache.org)
> > Subject: RE: [EXTERNAL][PROPOSAL] (Internal) move of provider packages to 
> > isolated "providers" sub-folders
> >
> >
> >
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> >
> >
> >
> >
> >
> >
> >
> >
> > And just to clarify - there is a little too much redundancy in your example:
> >
> > `airflow/providers/airbyte/hooks/airbyte.py` becomes 
> > `airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`
> >
> > The leading "airflow" is not there in the target path:
> >
> > `airflow/providers/airbyte/hooks/airbyte.py` becomes 
> > `providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`
> >
> > And the redundancy comes from the fact that we have to somehow name the 
> > directory where the entire provider is moved. Let me rewrite it differently:
> >
> > providers/airbyte <- This is where the airbyte provider is moved
> > |
> > src/airflow/providers/airbyte/hooks/airbyte.py <- this is actual path 
> > "inside" the provider
> >
> >
> >
> > So there is far less redundancy than you think.
> >
> >
> > On Tue, Dec 13, 2022 at 2:40 AM Jarek Potiuk <ja...@potiuk.com 
> > (mailto:ja...@potiuk.com)> wrote:
> > > >
> > > > I have two immediate thoughts on it. First, can this be used as an 
> > > > opportunity to reorganize the system tests tree into the provider's 
> > > > test tree? Instead of having `tests/system/providers/{foo}`, maybe we 
> > > > can move those system tests alongside the other provider tests like 
> > > > this:
> > >
> > > Somehow what you wanted to add here was lost so I am not sure what the 
> > > proposal is :).
> > >
> > > > My other thought is that I think I am missing something here as far as 
> > > > the new organization goes. I haven't spent the time that you have spent 
> > > > looking at how to make this work, and I also realize this is early PoC 
> > > > discussion so maybe this isn't the end target, but the new paths are 
> > > > very long and redundant. Is that for automation reasons? For example in 
> > > > the image you shared:
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > > > `airflow/providers/airbyte/hooks/airbyte.py` becomes 
> > > > `airflow/providers/airbyte/src/airflow/providers/airbyte/hooks/airbyte.py`
> > > > `tests/providers/airbyte/hooks/test_airbyte.py` becomes 
> > > > `airflow/providers/airbyte/tests/airflow/providers/airbyte/hooks/test_airbyte.py.
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > For sources, I am quite sure we cannot do it - as this would end up with 
> > > a non-standard package and we would have to make some non-standard 
> > > manipulations with import paths and it would confuse multiple IDEs like 
> > > crazy.
> > > Our providers are in "airflow.providers.<provider>" package. The "src" is 
> > > the root of sources and we need to have "airflow"/"providers" as the 
> > > package in.
> > >
> > > Also this choice was largely based on the official Python packaging 
> > > tutorial: 
> > > https://packaging.python.org/en/latest/tutorials/packaging-projects/ 
> > > (this is where I took the "src" from - it's not been very common in 
> > > Python Projects - including Airflow - but it is now recommended in 
> > > multiple places including official Pypa tutorial:
> > > packaging_tutorial/
> > > └── src/
> > > └── example_package_YOUR_USERNAME_HERE/
> > > ├── __init__.py
> > > └── example.py
> > >
> > >
> > > Regarding tests - yes, that could be removed. I even had it like that 
> > > initially. And I am on the fence on this one.
> > >
> > > Putting tests at the top level has some drawbacks. One drawback is that 
> > > if we put tests at the top, we miss "namespace". Having a "namespace" 
> > > makes tests unique even if they are defined in different providers and 
> > > are both put on PYTHONPATH
> > >
> > > Imagine provider_a defining "tests/hooks/test_utils.py" and provider_b 
> > > defining "tests/utils/test_utils.py". When you import 
> > > "tests.utils.test_utils" - which one do you import if they are all in the 
> > > PYTHONPATH? This will become a problem when we will make all of them 
> > > added to PYTHONPATH in your IDE - I think. I believe many IDEs (including 
> > > PyCharm and VSCode) will not work with multiple, separate python modules 
> > > and they will get confused when seeing multiple repeated packages with 
> > > the same modules on PYTHONPATH.
> > > <package_name> gives us "namespace" and guarantees that each module will 
> > > be different.
> > >
> > > Also keeping them in packages is pretty consistent with some guidelines I 
> > > saw that "tests/<package>" is recommended. I would love however to hear 
> > > more from others experiences (especially TP) as maybe there are other 
> > > ways I have not thought about. I would also prefer "tests/test_mmm.py" 
> > > rather than "tests/airflow/providers/providers/test_mmm.py" - but I am 
> > > afraid it's going to be difficult due to the "repeated modules" problem.
> > >
> > >

Reply via email to