> Also it might improve performance a bit, mainly as they would run with
xdist in parallel.

Absolutely. That's one of the driving changes for me to make all our
Provider tests non-DB ones. Xdist introduces it's own set of challenges of
course - ie. it makes things more prone to side-effects, but so far it has
played relatively well for us. And it allows us to spot some usage patterns
where we are relying on "no-side effects" and get rid of those (example is
the "caplog" one which we just consented to get rid of because it makes us
prone to unintended side effects - and we get rid of those not by fixing
them, but by making our code more resilient to those side-effects.

I would - one day - want to enable order-randomisation in our tests, which
is the ultimate way of making sure that there are no side effects. So far
any time i tried it on providers, there were so many side-effects that it
would be hugely detrimental to the sanity of our contributors, but if we
gradually improve our test suite, get rid of DB and side-effects, and
carefully enable randomisation - first per module, then per package and
finally per all test suite, we might eventually get there. And Xdist is one
of the ways to gently introduce it because it introduced a pretty
"controlled" randomisation - i.e. mostly tests are executed in the same
order but rarely there are variations due to adding new tests or some
environmental conditions like speed of disk or CPU.

J.



On Sat, Feb 8, 2025 at 12:18 PM Jens Scheffler <j_scheff...@gmx.de.invalid>
wrote:

> Hi there,
>
> so then I hear that this is a very constructive proposal - compared to
> clearing/writing/using connections for pytests in the DB there should
> rather be a generic Fixture that can build on top of Monkeypatch.Setenv.
> That saves time and as well can convert these tests to be non-DB,
> correct? Also it might improve performance a bit, mainly as they would
> run with xdist in parallel.
>
> Jens
>
> On 07.02.25 23:10, Ash Berlin-Taylor wrote:
> > Mock and Connection don’t make sense together. A Connection is
> effectively a dataclass that is loaded from the DB. There’s nothing to mock.
> >
> > Also are you aware that you can set `ARIFLOW_CONN_*` environment
> variables and those will be looked at before the DB. That plus
> [monkeypatch.setenv](
> https://docs.pytest.org/en/stable/reference/reference.html#pytest.MonkeyPatch.setenv)
> might be all we need, and a custom fixture in tests_common/pytest_plugin.py?
> >
> > Though again, I question how reading a row from the DB by it’s PK can be
> slow even if you read it 1000 times in a test…
> >
> >
> >
> >> On 7 Feb 2025, at 17:43, Blain David <david.bl...@infrabel.be> wrote:
> >>
> >>> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s
> a single row. I think I’m just confused or misdirected about your comment
> about database here. Can you give a concrete example of the change you
> would make, and how this will speed things up?
> >> It's not only pulling, I saw some tests creating multiple test
> connections before each tests, and the those connections being retrieved
> multiple times within tests, not that it's super slow, but it won't make
> tests lightning fast either imho.
> >>
> >>> So what are you actually proposing?
> >> I would propose a common test helper function which returns a mocked
> Airflow connection, and for the providers we will need to mock the SDK once
> direct DB connection won't be possible anymore (which is good).
> >>
> >>> We have to be aware of making our tests overly fragile if we replace
> everything with mocks, then we are only testing our mocks and not the real
> behaviour.
> >> Normally this behaviour should be tested in dedicated integration
> tests, not on each unit test for hooks/operators/sensors/triggers and could
> be tested in a generic matter imho.  That’s what I mean with "slow down",
> too much tests are doing this (especially in the providers).  But I agree
> (and very good point) we should be careful and make sure we don't purely
> rely tests on mocks, hence why integration tests.  I've stumbled this week
> on a test in Airflow provider where the mocked feature was indeed making
> the test succeed, but was in fact lacking a real implementation (e.g.
> Elasticsearch) thus not working at all.  Such things (only testing mocks or
> tests that in fact test nothing like you suggest) can be detected through
> mutation testing, but those are slow and expensive, not that they should
> run each time (maybe only on master for example), but could help monitor
> the quality of the tests, so it would have detected the case I described
> and the point that you made (e.g. fragile tests).
> >>
> >> -----Original Message-----
> >> From: Ash Berlin-Taylor <a...@apache.org <mailto:a...@apache.org>>
> >> Sent: Friday, 7 February 2025 10:28
> >> To: dev@airflow.apache.org <mailto:dev@airflow.apache.org>
> >> Subject: Re: [PROPOSAL] Remove creation of real Airflow connections in
> provider unit tests
> >>
> >> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze
> niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel,
> stuur deze e-mail als bijlage naar ab...@infrabel.be <mailto:
> ab...@infrabel.be><mailto:ab...@infrabel.be>.
> >>
> >> The providers tests will soon (but possibly not before 3.0 at this
> point) need to be converted to use the TaskSDK properly which won’t/can't
> actually use the DB, so we will need to do something soon.
> >>
> >>> Hence that’s why when I do refactorings in provider unit tests, I’ve
> >>> already replaced those real connections with mocked ones making tests
> >>> run faster locally (no database needed)
> >> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s
> a single row. I think I’m just confused or misdirected about your comment
> about database here. Can you give a concrete example of the change you
> would make, and how this will speed things up?
> >>
> >> To my mind creating/obtaining the Connection object isn’t the slow
> part, but doing anything with that connection. But connections don’t
> actually do the connecting/opening sockets/network requests — that’s all in
> the Hook classes.
> >>
> >> So what are you actually proposing?
> >>
> >> We have to be aware of making our tests overly fragile if we replace
> everything with mocks, then we are only testing our mocks and not the real
> behaviour.
> >>
> >> -ash
> >>
> >>
> >>> On 7 Feb 2025, at 08:50, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>
> >>> +10 on that. My next step after finishing Provider's move, was to make
> >>> essentially all unit tests in Providers non-DB tests and removing
> >>> "real connection" usage is part of it.
> >>>
> >>> This is essentially stage 3 of
> >>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >>> ub.com <http://ub.com/
> >%2Fapache%2Fairflow%2Fissues%2F42632&data=05%7C02%7Cdavid.blain%
> >>> 40infrabel.be%7Cc6eefb61c10240de8c0d08dd4759d0d0%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638745173290138741%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XZMKiZvZqGAUmm9tNXbewmoZW%2ByaOXyzGmoA%2BaRWEYE%3D&reserved=0
> that is planned and I want to make POC and indeed involve others in
> crowd-sourcing the change (similar to provider's move) after I figure out
> how to do it.
> >>>
> >>> J.
> >>>
> >>>
> >>> On Fri, Feb 7, 2025 at 8:35 AM Blain David <david.bl...@infrabel.be>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>>
> >>>>
> >>>> The caplog vote triggered me to launch this proposal as it’s also
> >>>> related to unit testing, and as I think we want our unit tests as
> >>>> clean and as simple and as fast as possible.
> >>>>
> >>>> I think it would be a good practise to not define and create real
> >>>> Airflow connections within the providers unit tests (which use the
> >>>> Airflow test database), as normally when writing unit tests those
> >>>> should be isolated and not be depend on any external systems like a
> database.
> >>>>
> >>>>
> >>>>
> >>>> Also in my case those make the tests to run slower.  Beside that I
> >>>> ‘ve also noticed when working on some PR regarding providers,
> >>>> sometimes there are some glitches within the CI/CD which seem to
> >>>> cause issues with those “real” connections, causing tests to randomly
> fail.
> >>>>
> >>>> Hence that’s why when I do refactorings in provider unit tests, I’ve
> >>>> already replaced those real connections with mocked ones making tests
> >>>> run faster locally (no database needed) and no more random failures
> >>>> during tests (possibly preceding tests that mess up connections).
> >>>>
> >>>> That’s doesn’t mean we don’t want to use the database of course
> >>>> during tests, I’m just saying it’s a bit of overkill to use a
> >>>> database in a unit test just to get a connection.
> >>>>
> >>>>
> >>>>
> >>>> We could also create a common mocking method for connections in
> >>>> tests_common and use it across all unit tests, now those are mostly
> >>>> redefined across different provider tests.
> >>>>
> >>>>
> >>>>
> >>>> Of course I’m willing to contribute on this one, what do you think
> >>>> about this idea?  Personally, I think this can only make maintenance
> >>>> easier (and prevent random failures and faster tests results).
> >>>>
> >>>>
> >>>>
> >>>> Curious of your thoughts.
> >>>>
> >>>>
> >>>>
> >>>> Kind regards,
> >>>>
> >>>> David
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> *David Blain*
> >>>>
> >>>> Data Engineer *at* ICT-514 - BI End User Reporting
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org <mailto:
> dev-unsubscr...@airflow.apache.org>
> >> For additional commands, e-mail: dev-h...@airflow.apache.org <mailto:
> dev-h...@airflow.apache.org>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>

Reply via email to