On 7 Feb 2025, at 17:43, Blain David <david.bl...@infrabel.be> wrote:
Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s a single
row. I think I’m just confused or misdirected about your comment about database
here. Can you give a concrete example of the change you would make, and how
this will speed things up?
It's not only pulling, I saw some tests creating multiple test connections
before each tests, and the those connections being retrieved multiple times
within tests, not that it's super slow, but it won't make tests lightning fast
either imho.
So what are you actually proposing?
I would propose a common test helper function which returns a mocked Airflow
connection, and for the providers we will need to mock the SDK once direct DB
connection won't be possible anymore (which is good).
We have to be aware of making our tests overly fragile if we replace everything
with mocks, then we are only testing our mocks and not the real behaviour.
Normally this behaviour should be tested in dedicated integration tests, not on each unit
test for hooks/operators/sensors/triggers and could be tested in a generic matter imho.
That’s what I mean with "slow down", too much tests are doing this (especially
in the providers). But I agree (and very good point) we should be careful and make sure
we don't purely rely tests on mocks, hence why integration tests. I've stumbled this
week on a test in Airflow provider where the mocked feature was indeed making the test
succeed, but was in fact lacking a real implementation (e.g. Elasticsearch) thus not
working at all. Such things (only testing mocks or tests that in fact test nothing like
you suggest) can be detected through mutation testing, but those are slow and expensive,
not that they should run each time (maybe only on master for example), but could help
monitor the quality of the tests, so it would have detected the case I described and the
point that you made (e.g. fragile tests).
-----Original Message-----
From: Ash Berlin-Taylor <a...@apache.org <mailto:a...@apache.org>>
Sent: Friday, 7 February 2025 10:28
To: dev@airflow.apache.org <mailto:dev@airflow.apache.org>
Subject: Re: [PROPOSAL] Remove creation of real Airflow connections in provider
unit tests
EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet vertrouwt, klik
niet op een link of open geen bijlages. Bij twijfel, stuur deze e-mail als bijlage naar
ab...@infrabel.be <mailto:ab...@infrabel.be><mailto:ab...@infrabel.be>.
The providers tests will soon (but possibly not before 3.0 at this point) need
to be converted to use the TaskSDK properly which won’t/can't actually use the
DB, so we will need to do something soon.
Hence that’s why when I do refactorings in provider unit tests, I’ve
already replaced those real connections with mocked ones making tests
run faster locally (no database needed)
Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s a single
row. I think I’m just confused or misdirected about your comment about database
here. Can you give a concrete example of the change you would make, and how
this will speed things up?
To my mind creating/obtaining the Connection object isn’t the slow part, but
doing anything with that connection. But connections don’t actually do the
connecting/opening sockets/network requests — that’s all in the Hook classes.
So what are you actually proposing?
We have to be aware of making our tests overly fragile if we replace everything
with mocks, then we are only testing our mocks and not the real behaviour.
-ash
On 7 Feb 2025, at 08:50, Jarek Potiuk <ja...@potiuk.com> wrote:
+10 on that. My next step after finishing Provider's move, was to make
essentially all unit tests in Providers non-DB tests and removing
"real connection" usage is part of it.
This is essentially stage 3 of
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
ub.com
<http://ub.com/>%2Fapache%2Fairflow%2Fissues%2F42632&data=05%7C02%7Cdavid.blain%
40infrabel.be%7Cc6eefb61c10240de8c0d08dd4759d0d0%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638745173290138741%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XZMKiZvZqGAUmm9tNXbewmoZW%2ByaOXyzGmoA%2BaRWEYE%3D&reserved=0
that is planned and I want to make POC and indeed involve others in crowd-sourcing the
change (similar to provider's move) after I figure out how to do it.
J.
On Fri, Feb 7, 2025 at 8:35 AM Blain David <david.bl...@infrabel.be> wrote:
Hello,
The caplog vote triggered me to launch this proposal as it’s also
related to unit testing, and as I think we want our unit tests as
clean and as simple and as fast as possible.
I think it would be a good practise to not define and create real
Airflow connections within the providers unit tests (which use the
Airflow test database), as normally when writing unit tests those
should be isolated and not be depend on any external systems like a database.
Also in my case those make the tests to run slower. Beside that I
‘ve also noticed when working on some PR regarding providers,
sometimes there are some glitches within the CI/CD which seem to
cause issues with those “real” connections, causing tests to randomly fail.
Hence that’s why when I do refactorings in provider unit tests, I’ve
already replaced those real connections with mocked ones making tests
run faster locally (no database needed) and no more random failures
during tests (possibly preceding tests that mess up connections).
That’s doesn’t mean we don’t want to use the database of course
during tests, I’m just saying it’s a bit of overkill to use a
database in a unit test just to get a connection.
We could also create a common mocking method for connections in
tests_common and use it across all unit tests, now those are mostly
redefined across different provider tests.
Of course I’m willing to contribute on this one, what do you think
about this idea? Personally, I think this can only make maintenance
easier (and prevent random failures and faster tests results).
Curious of your thoughts.
Kind regards,
David
*David Blain*
Data Engineer *at* ICT-514 - BI End User Reporting
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
<mailto:dev-unsubscr...@airflow.apache.org>
For additional commands, e-mail: dev-h...@airflow.apache.org
<mailto:dev-h...@airflow.apache.org>