> I'm glad you landed on the unit/integration/system naming convention, I
like that but I'm curious what would go in the integration path.  Before I
started with Airflow, I had been taught that integration test are what we
call system tests.  I've come to use the two terms more or less
interchangeably.  What I am familiar without outside of Airflow is unit
tests are within a module and integration tests are between modules.  Under
that definition, most (all?) of what we currently call system tests are
integration tests.  Maybe it's an "all squares are rectangles but not all
rectangles are squares" type definition and I'm missing some distinction
between the terms.  I guess some of what we have in the existing unit tests
would be more accurately integ tests, so maybe that's what you have in mind.

Funny you mention this - even today when driving when I was reading to "The
real Python" Podcast from earlier this week about TDD vs. BDD
https://realpython.com/podcasts/rpp/239/ and Christopher Bailey and
Christopher Trudeau have been explaining all kinds of testing and "Test
pyramid" and they explicitly mention how blurry and imperfect the
nomenclature is for tests :D. If you look for it, you will find several
variants of the "Test pyramid" with anything from 3 to 7 layers and with
names that are sometimes the same but they are different places in
different pyramids - sometimes even the same names reverted in different
variants.

According to some people even our unit tests (the db ones especially) are
really hardly "unit tests" - unit tests should focus on testing class or
function, and in many cases our unit tests are testing .... more ... and db
... and sometimes networking ... and .......

So I'd say - "unit", "integration" and "system" are names that we should
agree on and define in Airflow to know what they really mean, but I don't
think there is a universally standard definition for either of those that
can be applied to what we do in airflow. And I think we have to come up
with our own convention. Actually we have even more types of tests - docker
compose tests, kubernetes tests, helm unit tests ....  (but they are not in
providers,- and as part of our refactor we will get to those eventually
when we will be moving our project structure:

Currently, full list of types of tests we have and detailed explanation
what each test type is are in
https://github.com/apache/airflow/blob/main/contributing-docs/09_testing.rst

And let me quote it here:


   - Unit tests
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst>
are
   Python tests that do not require any additional integrations. Unit tests
   are available both in the Breeze environment
   <https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst>
    and local virtualenv
   
<https://github.com/apache/airflow/blob/main/contributing-docs/07_local_virtualenv.rst>.
   Note that in order for a pull request to be reviewed, *it should have a
   unit test*, unless a test is not required i.e. for documentation changes.
   - Integration tests
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/integration_tests.rst>
are
   available in the Breeze environment
   <https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst> that
   is also used for Airflow CI tests. Integration tests are special tests that
   require additional services running, such as Postgres, MySQL, Kerberos, etc.
   - Docker Compose tests
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/docker_compose_tests.rst>
are
   tests we run to check if our quick start docker-compose works.
   - Kubernetes tests
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/k8s_tests.rst>
are
   tests we run to check if our Kubernetes deployment and Kubernetes Pod
   Operator works.
   - Helm unit tests
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/helm_unit_tests.rst>
are
   tests we run to verify if Helm Chart is rendered correctly for various
   configuration parameters.
   - System tests
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/system_tests.rst>
are
   automatic tests that use external systems like Google Cloud and AWS. These
   tests are intended for an end-to-end DAG execution.

You can also run other kinds of tests when you are developing airflow
packages:

   - Testing packages
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/testing_packages.rst>
is
   a document that describes how to manually build and test pre-release
   candidate packages of airflow and providers.
   - Python client tests
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/python_client_tests.rst>
are
   tests we run to check if the Python API client works correctly.
   - DAG testing
   
<https://github.com/apache/airflow/blob/main/contributing-docs/testing/dag_testing.rst>
is
   a document that describes how to test DAGs in a local environment with
   DebugExecutor. Note, that this is a legacy method - you can now use
   dag.test() method to test DAGs.

So the "system"/ "integration"/ "unit" we have in Providers - is exactly
reflecting the terminology and definition we defined a long time ago. Of
course - we can still change if we think it's worth it. BUT IMHO - because
there are no "standards" - in any project you have to learn what a
particular type of test really is. And in Airflow we have a very complete
and well documented definition (follow the links to see it). So we are at
least consistent "internally".

J.




On Tue, Feb 18, 2025 at 7:51 PM Ferruzzi, Dennis
<ferru...@amazon.com.invalid> wrote:

> Looks like this was discussed, decided, and done over the weekend, but
> just wanted to say thanks for that.  The redundancy in he import paths has
> been bugging me LOL
>
> I'm glad you landed on the unit/integration/system naming convention, I
> like that but I'm curious what would go in the integration path.  Before I
> started with Airflow, I had been taught that integration test are what we
> call system tests.  I've come to use the two terms more or less
> interchangeably.  What I am familiar without outside of Airflow is unit
> tests are within a module and integration tests are between modules.  Under
> that definition, most (all?) of what we currently call system tests are
> integration tests.  Maybe it's an "all squares are rectangles but not all
> rectangles are squares" type definition and I'm missing some distinction
> between the terms.  I guess some of what we have in the existing unit tests
> would be more accurately integ tests, so maybe that's what you have in mind.
>
> Anyway, thanks for another major cleanup task!
>
>
>
>  - ferruzzi
>
>
> ________________________________
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Sunday, February 16, 2025 3:48 AM
> To: dev@airflow.apache.org
> Subject: RE: [EXT] [DISCUSS] Tests structure for providers
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> And yes Ash -> ruff rule it is to ban "providers" imports
> https://github.com/apache/airflow/pull/46801 (and it found two places
> where
> they were still used)
>
> On Sun, Feb 16, 2025 at 12:35 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > And merged! that was fast. Thanks Bugra, Ash, Jens !
> >
> > On Sun, Feb 16, 2025 at 11:40 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >> https://github.com/apache/airflow/pull/46800 -> there you go :) . Happy
> >> reviewing of 1570+ files changed.
> >>
> >> BTW. It's way easier to review it with `git diff` than with Github UI :)
> >>
> >> On Sun, Feb 16, 2025 at 11:32 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>
> >>> Naaa. that one is super easy to implement and automate and it's better
> >>> to have one quick "surgical" cut to rename all of those :), I am about
> to
> >>> send PR with all of those changes in a moment.
> >>>
> >>> On Sun, Feb 16, 2025 at 11:26 AM Buğra Öztürk <ozturkbugr...@gmail.com
> >
> >>> wrote:
> >>>
> >>>> Amazing!:) Maybe we can divide and conquer this on structure changes
> and
> >>>> pre-commit rather than leaving you with all :)
> >>>>
> >>>> On Sun, 16 Feb 2025, 10:45 Jarek Potiuk, <ja...@potiuk.com> wrote:
> >>>>
> >>>> > Yeah "unit" seems to be the way to go :) . I will make another giant
> >>>> PR
> >>>> > (but this time it should be even easier to review) to convert to it
> :)
> >>>> >
> >>>> > On Sun, Feb 16, 2025 at 10:41 AM Buğra Öztürk <
> >>>> ozturkbugr...@gmail.com>
> >>>> > wrote:
> >>>> >
> >>>> > > Thanks for all the effort on this! It looks great!
> >>>> > > - The approach looks great!
> >>>> > > - Heavy plus one on this. The pre-commit should be one of the
> ground
> >>>> > > pillars of any standardisation. Having an extra file structure
> check
> >>>> > could
> >>>> > > be included too on top of `from` to eliminate new providers to
> >>>> follow the
> >>>> > > consistent structure.
> >>>> > > - I like the unit too over provider_tests. It is a great idea to
> >>>> > eliminate
> >>>> > > ambiguity. The other parts seem pretty pretty good.
> >>>> > >
> >>>> > > Along with the latest improvements, these will be great additions!
> >>>> > >
> >>>> > >
> >>>> > > On Sun, Feb 16, 2025 at 4:15 AM Rahul Vats <
> rah.sharm...@gmail.com>
> >>>> > wrote:
> >>>> > >
> >>>> > > > Thanks for the detailed explanation and for driving this
> effort—it
> >>>> > looks
> >>>> > > > like a solid improvement.
> >>>> > > >
> >>>> > > >    1.
> >>>> > > >
> >>>> > > >    I love the approach & new structure. I can't think of a
> better
> >>>> > > approach.
> >>>> > > >    2.
> >>>> > > >
> >>>> > > >    I agree that adding a pre-commit check to enforce consistent
> >>>> imports
> >>>> > > is
> >>>> > > >    a good idea. It will help maintain the new structure and
> >>>> prevent
> >>>> > > >    regressions.
> >>>> > > >    3.
> >>>> > > >
> >>>> > > >    For the naming, I’d prefer using unit instead of
> >>>> provider_tests to
> >>>> > > keep
> >>>> > > >    it more intuitive and consistent with system and integration.
> >>>> This
> >>>> > > makes
> >>>> > > >    the structure self-explanatory and easier for new
> contributors.
> >>>> > > >
> >>>> > > >
> >>>> > > > Regards,
> >>>> > > > Rahul Vats
> >>>> > > >
> >>>> > > > On Sun, 16 Feb 2025 at 03:50, Jarek Potiuk <ja...@potiuk.com>
> >>>> wrote:
> >>>> > > >
> >>>> > > > > > 2) no real opinion, seems useful. Let’s see if we can do it
> >>>> with a
> >>>> > > ruff
> >>>> > > > > rule first? It _might_ be possible. (It is possible to extend
> >>>> the
> >>>> > ruff
> >>>> > > > > rules on a per subproject basis, check out
> >>>> task_sdk/pyproject.toml)
> >>>> > > > >
> >>>> > > > > I am not sure if we need ruff rule for it, as we have not yet
> >>>> enabled
> >>>> > > any
> >>>> > > > > `ruff` airflow rules in airflow development (it is targeted so
> >>>> far
> >>>> > for
> >>>> > > > > users writing DAGs}
> >>>> > > > >
> >>>> > > > > I think it's more "airflow internal" rule not something that
> >>>> can be
> >>>> > > used
> >>>> > > > > for general audience (which is pretty-much what ruff rules are
> >>>> about)
> >>>> > > > >
> >>>> > > > > Unfortunately - the current way ruff works does not allow
> local
> >>>> > > > > "plugin" functionality to add new rules that would only be
> >>>> applicable
> >>>> > > to
> >>>> > > > > the current project you are in. In the absence of that
> feature,
> >>>> > having
> >>>> > > a
> >>>> > > > > python pre-commit seem the next-best thing.
> >>>> > > > >
> >>>> > > > > J.
> >>>> > > > >
> >>>> > > > >
> >>>> > > > > On Sat, Feb 15, 2025 at 10:50 PM Ash Berlin-Taylor <
> >>>> a...@apache.org>
> >>>> > > > wrote:
> >>>> > > > >
> >>>> > > > > > 100%. It’s on my list to tackle very soon, it can’t get put
> >>>> off
> >>>> > much
> >>>> > > > > > longer.
> >>>> > > > > >
> >>>> > > > > > I have some thoughts (around back compat mostly) but they
> >>>> aren’t
> >>>> > > > > collected
> >>>> > > > > > yet, and I won’t derail this thread.
> >>>> > > > > >
> >>>> > > > > > > On 15 Feb 2025, at 21:43, Jarek Potiuk <ja...@potiuk.com>
> >>>> wrote:
> >>>> > > > > > >
> >>>> > > > > > > cc: @ashb @uranusjr @kaxil -> something for consideration
> >>>> in our
> >>>> > > > > > > discussions on how to repackage airlfow soon. I keep on
> >>>> > explaining
> >>>> > > > why
> >>>> > > > > > > running code in airflow.__init__.py is a bad idea and
> >>>> advocating
> >>>> > > for
> >>>> > > > > > > removal of it and replace it with explicit initialization,
> >>>> yet
> >>>> > that
> >>>> > > > > topic
> >>>> > > > > > > have not been discussed yet, but I will plan to start a
> >>>> > discussion
> >>>> > > > > about
> >>>> > > > > > it
> >>>> > > > > > > soon once we approach the packaging subject. I am not sure
> >>>> what's
> >>>> > > > your
> >>>> > > > > > > thinking is about this - I know you spent consirderable
> >>>> amount of
> >>>> > > > time
> >>>> > > > > on
> >>>> > > > > > > doing all the "lazy initalization" dance all over the
> >>>> places,
> >>>> > and I
> >>>> > > > > think
> >>>> > > > > > > it adds a lot of complexity to our code and only partially
> >>>> solves
> >>>> > > the
> >>>> > > > > > > cicular imports problem. But I know @ashb has very strong
> >>>> feeling
> >>>> > > > about
> >>>> > > > > > > being able to do "from airlfow import Dag" - which more or
> >>>> less
> >>>> > > > > requires
> >>>> > > > > > > all this complexity. I just don't think it's worth it.
> >>>> > > > > >
> >>>> > > > > >
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> > >
> >>>> > > --
> >>>> > > Bugra Ozturk
> >>>> > >
> >>>> >
> >>>>
> >>>
>

Reply via email to