Mock and Connection don’t make sense together. A Connection is effectively a dataclass that is loaded from the DB. There’s nothing to mock.
Also are you aware that you can set `ARIFLOW_CONN_*` environment variables and those will be looked at before the DB. That plus [monkeypatch.setenv](https://docs.pytest.org/en/stable/reference/reference.html#pytest.MonkeyPatch.setenv) might be all we need, and a custom fixture in tests_common/pytest_plugin.py? Though again, I question how reading a row from the DB by it’s PK can be slow even if you read it 1000 times in a test… > On 7 Feb 2025, at 17:43, Blain David <david.bl...@infrabel.be> wrote: > >> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s a >> single row. I think I’m just confused or misdirected about your comment >> about database here. Can you give a concrete example of the change you would >> make, and how this will speed things up? > > It's not only pulling, I saw some tests creating multiple test connections > before each tests, and the those connections being retrieved multiple times > within tests, not that it's super slow, but it won't make tests lightning > fast either imho. > >> So what are you actually proposing? > > I would propose a common test helper function which returns a mocked Airflow > connection, and for the providers we will need to mock the SDK once direct DB > connection won't be possible anymore (which is good). > >> We have to be aware of making our tests overly fragile if we replace >> everything with mocks, then we are only testing our mocks and not the real >> behaviour. > > Normally this behaviour should be tested in dedicated integration tests, not > on each unit test for hooks/operators/sensors/triggers and could be tested in > a generic matter imho. That’s what I mean with "slow down", too much tests > are doing this (especially in the providers). But I agree (and very good > point) we should be careful and make sure we don't purely rely tests on > mocks, hence why integration tests. I've stumbled this week on a test in > Airflow provider where the mocked feature was indeed making the test succeed, > but was in fact lacking a real implementation (e.g. Elasticsearch) thus not > working at all. Such things (only testing mocks or tests that in fact test > nothing like you suggest) can be detected through mutation testing, but those > are slow and expensive, not that they should run each time (maybe only on > master for example), but could help monitor the quality of the tests, so it > would have detected the case I described and the point that you made (e.g. > fragile tests). > > -----Original Message----- > From: Ash Berlin-Taylor <a...@apache.org <mailto:a...@apache.org>> > Sent: Friday, 7 February 2025 10:28 > To: dev@airflow.apache.org <mailto:dev@airflow.apache.org> > Subject: Re: [PROPOSAL] Remove creation of real Airflow connections in > provider unit tests > > EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet > vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur > deze e-mail als bijlage naar ab...@infrabel.be > <mailto:ab...@infrabel.be><mailto:ab...@infrabel.be>. > > The providers tests will soon (but possibly not before 3.0 at this point) > need to be converted to use the TaskSDK properly which won’t/can't actually > use the DB, so we will need to do something soon. > >> Hence that’s why when I do refactorings in provider unit tests, I’ve >> already replaced those real connections with mocked ones making tests >> run faster locally (no database needed) > > Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s a > single row. I think I’m just confused or misdirected about your comment about > database here. Can you give a concrete example of the change you would make, > and how this will speed things up? > > To my mind creating/obtaining the Connection object isn’t the slow part, but > doing anything with that connection. But connections don’t actually do the > connecting/opening sockets/network requests — that’s all in the Hook classes. > > So what are you actually proposing? > > We have to be aware of making our tests overly fragile if we replace > everything with mocks, then we are only testing our mocks and not the real > behaviour. > > -ash > > >> On 7 Feb 2025, at 08:50, Jarek Potiuk <ja...@potiuk.com> wrote: >> >> +10 on that. My next step after finishing Provider's move, was to make >> essentially all unit tests in Providers non-DB tests and removing >> "real connection" usage is part of it. >> >> This is essentially stage 3 of >> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith >> ub.com >> <http://ub.com/>%2Fapache%2Fairflow%2Fissues%2F42632&data=05%7C02%7Cdavid.blain% >> 40infrabel.be%7Cc6eefb61c10240de8c0d08dd4759d0d0%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638745173290138741%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XZMKiZvZqGAUmm9tNXbewmoZW%2ByaOXyzGmoA%2BaRWEYE%3D&reserved=0 >> that is planned and I want to make POC and indeed involve others in >> crowd-sourcing the change (similar to provider's move) after I figure out >> how to do it. >> >> J. >> >> >> On Fri, Feb 7, 2025 at 8:35 AM Blain David <david.bl...@infrabel.be> wrote: >> >>> Hello, >>> >>> >>> >>> The caplog vote triggered me to launch this proposal as it’s also >>> related to unit testing, and as I think we want our unit tests as >>> clean and as simple and as fast as possible. >>> >>> I think it would be a good practise to not define and create real >>> Airflow connections within the providers unit tests (which use the >>> Airflow test database), as normally when writing unit tests those >>> should be isolated and not be depend on any external systems like a >>> database. >>> >>> >>> >>> Also in my case those make the tests to run slower. Beside that I >>> ‘ve also noticed when working on some PR regarding providers, >>> sometimes there are some glitches within the CI/CD which seem to >>> cause issues with those “real” connections, causing tests to randomly fail. >>> >>> Hence that’s why when I do refactorings in provider unit tests, I’ve >>> already replaced those real connections with mocked ones making tests >>> run faster locally (no database needed) and no more random failures >>> during tests (possibly preceding tests that mess up connections). >>> >>> That’s doesn’t mean we don’t want to use the database of course >>> during tests, I’m just saying it’s a bit of overkill to use a >>> database in a unit test just to get a connection. >>> >>> >>> >>> We could also create a common mocking method for connections in >>> tests_common and use it across all unit tests, now those are mostly >>> redefined across different provider tests. >>> >>> >>> >>> Of course I’m willing to contribute on this one, what do you think >>> about this idea? Personally, I think this can only make maintenance >>> easier (and prevent random failures and faster tests results). >>> >>> >>> >>> Curious of your thoughts. >>> >>> >>> >>> Kind regards, >>> >>> David >>> >>> >>> >>> >>> >>> *David Blain* >>> >>> Data Engineer *at* ICT-514 - BI End User Reporting >>> >>> >>> >>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > <mailto:dev-unsubscr...@airflow.apache.org> > For additional commands, e-mail: dev-h...@airflow.apache.org > <mailto:dev-h...@airflow.apache.org>