Hi Jarek,

I like the general idea of the structure as you propose.

I would not "swap" as you described afterwards but propose to use the first proposal you made.

For docker and k8s tests I think it would be better to keep them outside of "airflow-core" as they have a rather integrative/system scope and are only indirectly related to core. For the moment I'd keep then and once we re-structure the packages still we can align later.

Jens

On 25.07.25 11:57, Amogh Desai wrote:
Hi Jarek,

So maybe it's a good time to improve it and come up with a convention that
we can apply also to shared folders? Such change will be very little
disruptive and can be largely automated and applied in a single PR

Yeah, I think it's a good time and idea to get such a thing done.

*> from unit.amazon import s3_operator_test*
I never paid close attention to such import patterns and I do not like
them. Its very
confusing and ambiguous, and it's very hard to tell "what package is this"
to any of those
imports.

While writing the Task SDK integration tests I did keep a lot of those
things in mind and if you
see the internal imports I have, they are of the pattern:
*from task_sdk_tests.constants import AIRFLOW_ROOT_PATH *which is far less
ambiguous and
confusing.

Note that this does not yet tackle the other "special" test types we have
-> kubernetes-tests, docker-tests, task-sdk-tests (right now being merged
by Amogh), especially the last one seem to be clashing with
`task_sdk_tests`. But I think (and this would be separate discussion) -
this also adds an opportunity (next step) to move those "special" test
types we have. Eventually we could move those tests to be "yet another test
type" under "tests":

Agreed, we can consider moving those to the right places where applicable,
like:
*task_sdk_tests/*

*├── unit/└── integration/      # Moved from task-sdk-tests/*

We can start small here and build up to fix the issues one package at a
time, and if
we can come up with a good AI prompt, we can get some work multiplexed by
the community
too.


Thanks & Regards,
Amogh Desai


On Thu, Jul 24, 2025 at 8:38 PM Jarek Potiuk <ja...@potiuk.com> wrote:

Hello here,

Since we have now started even more isolation and separation of the code
and libraries, I think it might be a good time to maybe improve a bit the
way our tests "package" hierarchy is done - in providers and elsewhere.

This question came up when i started to add pre-commit checks after our
"shared code" PR have been merged - in
https://github.com/apache/airflow/pull/53697

One problem with having tests in monorepo, is that if all "tests" folders
are added to PYTHONPATH, we cannot really put tests in the "top" level of
the tests folder (that is pretty natural whe you do not have multiple of
"tests" folders. There are two reasons for that:

* it's likely module names or package names will have the same names in
separate distributions
* quite often - if we add tests in a sub-package of tests directly they
might clash with the same names - for example from stdlib or another
package we installed. Good examples of it (and something that happened in
the past were `tests/smtp/test_*` and `tests/kubernetes/test_*`. Depending
on how your IDE and env was configured you could have ended up in "import
kubernetes" not behaving as you expected.

So far in providers we have the "{unit/system/integration}/<PROVIDER>/"
convention that solved the issue. But even me as author of it, I must admit
`from unit.amazon` when you want to cross-import between different amazon
or different providers looks rather stupid and baffling. Even recently Ash
commented in one PR "that must be wrong" - and yeah, it does look wrong....

So maybe it's a good time to improve it and come up with a convention that
we can apply also to shared folders? Such change will be very little
disruptive and can be largely automated and applied in a single PR - and
modern git and IDE integration will follow such changes nicely - so that
you can see history of changes and it resolves most of conflicts
automatically - so it could be done with very little disruption (only
currently opened PRs will have to be rebased).

My proposal would be to add a "parent" folder in the "tests" directory to
indicate where the test is coming from. That might sound like a
duplication, but I think it's a natural consequence of having code in
monorepo, and a nice way to improve the organisation of tests. Also it has
an immediate notion or where the imports come from.

airflow-core\
              src\
                  airflow\
              tests\
                    airflow_core_tests\
                                       unit\
                                       system\
                                       integration\
airflow-task-sdk\
                  src\
                      airflow\
                              sdk
                  tests\
                        task_sdk_tests\
                                      # here all tests are unit so no need
for sub-packages

providers/amazon/tests\
                        provider_tests\
                                       unit\
                                            amazon
                                       integration\
                                                  amazon


providers/cncf/kubernetes/tests\
                                 provider_tests\
                                               unit\
                                                   cncf\kubernetes\
                                               integration\
                                                           cncf\kubernetes\


I also considered swapping "unit" / "provider" (provider_tests\amazon\unit)
- but that would make it a bit more complex when we choose type of tests to
run or examples to have and I think it's somewhat better to have the
unit/system/integration distinction right after provider_tests because
essentially those test types are VERY different.

Again - that sounds like a lot of duplication in the path, but I think it's
worth it (and our pre-commits already make sure that there are no typos and
keep it in order and they can be updated to keep this new structure).

The main benefit of it hat when you import (wherever) between different
distributions - the imports will **look** better and be more obvious:

from provider_tests.unit.amazon import some_amazon_test_code
from task_sdk_tests import apis

Note that this does not yet tackle the other "special" test types we have
-> kubernetes-tests, docker-tests, task-sdk-tests (right now being merged
by Amogh), especially the last one seem to be clashing with
`task_sdk_tests`. But I think (and this would be separate discussion) -
this also adds an opportunity (next step) to move those "special" test
types we have. Eventually we could move those tests to be "yet another test
type" under "tests":

airflow_core_tests\
                    unit
                    system
                    integration
                    docker
                    kubernetes

task_sdk_tests\
               unit
               integration

But I would approach it as the next step after we reorganize imports for
the "regular" tests we have.

WDYT? Good idea? Other proposals?

J.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to