Mock and Connection don’t make sense together. A Connection is effectively a 
dataclass that is loaded from the DB. There’s nothing to mock.

Also are you aware that you can set `ARIFLOW_CONN_*` environment variables and 
those will be looked at before the DB. That plus 
[monkeypatch.setenv](https://docs.pytest.org/en/stable/reference/reference.html#pytest.MonkeyPatch.setenv)
 might be all we need, and a custom fixture in tests_common/pytest_plugin.py?

Though again, I question how reading a row from the DB by it’s PK can be slow 
even if you read it 1000 times in a test…



> On 7 Feb 2025, at 17:43, Blain David <david.bl...@infrabel.be> wrote:
> 
>> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s a 
>> single row. I think I’m just confused or misdirected about your comment 
>> about database here. Can you give a concrete example of the change you would 
>> make, and how this will speed things up?
> 
> It's not only pulling, I saw some tests creating multiple test connections 
> before each tests, and the those connections being retrieved multiple times 
> within tests, not that it's super slow, but it won't make tests lightning 
> fast either imho.
> 
>> So what are you actually proposing?
> 
> I would propose a common test helper function which returns a mocked Airflow 
> connection, and for the providers we will need to mock the SDK once direct DB 
> connection won't be possible anymore (which is good).
> 
>> We have to be aware of making our tests overly fragile if we replace 
>> everything with mocks, then we are only testing our mocks and not the real 
>> behaviour.
> 
> Normally this behaviour should be tested in dedicated integration tests, not 
> on each unit test for hooks/operators/sensors/triggers and could be tested in 
> a generic matter imho.  That’s what I mean with "slow down", too much tests 
> are doing this (especially in the providers).  But I agree (and very good 
> point) we should be careful and make sure we don't purely rely tests on 
> mocks, hence why integration tests.  I've stumbled this week on a test in 
> Airflow provider where the mocked feature was indeed making the test succeed, 
> but was in fact lacking a real implementation (e.g. Elasticsearch) thus not 
> working at all.  Such things (only testing mocks or tests that in fact test 
> nothing like you suggest) can be detected through mutation testing, but those 
> are slow and expensive, not that they should run each time (maybe only on 
> master for example), but could help monitor the quality of the tests, so it 
> would have detected the case I described and the point that you made (e.g. 
> fragile tests).
> 
> -----Original Message-----
> From: Ash Berlin-Taylor <a...@apache.org <mailto:a...@apache.org>> 
> Sent: Friday, 7 February 2025 10:28
> To: dev@airflow.apache.org <mailto:dev@airflow.apache.org>
> Subject: Re: [PROPOSAL] Remove creation of real Airflow connections in 
> provider unit tests
> 
> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet 
> vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur 
> deze e-mail als bijlage naar ab...@infrabel.be 
> <mailto:ab...@infrabel.be><mailto:ab...@infrabel.be>.
> 
> The providers tests will soon (but possibly not before 3.0 at this point) 
> need to be converted to use the TaskSDK properly which won’t/can't actually 
> use the DB, so we will need to do something soon.
> 
>> Hence that’s why when I do refactorings in provider unit tests, I’ve 
>> already replaced those real connections with mocked ones making tests 
>> run faster locally (no database needed)
> 
> Pulling a connection from the DB itself shouldn’t/can’t be slow - It’s a 
> single row. I think I’m just confused or misdirected about your comment about 
> database here. Can you give a concrete example of the change you would make, 
> and how this will speed things up?
> 
> To my mind creating/obtaining the Connection object isn’t the slow part, but 
> doing anything with that connection. But connections don’t actually do the 
> connecting/opening sockets/network requests — that’s all in the Hook classes.
> 
> So what are you actually proposing?
> 
> We have to be aware of making our tests overly fragile if we replace 
> everything with mocks, then we are only testing our mocks and not the real 
> behaviour.
> 
> -ash
> 
> 
>> On 7 Feb 2025, at 08:50, Jarek Potiuk <ja...@potiuk.com> wrote:
>> 
>> +10 on that. My next step after finishing Provider's move, was to make
>> essentially all unit tests in Providers non-DB tests and removing 
>> "real connection" usage is part of it.
>> 
>> This is essentially stage 3 of
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>> ub.com 
>> <http://ub.com/>%2Fapache%2Fairflow%2Fissues%2F42632&data=05%7C02%7Cdavid.blain%
>> 40infrabel.be%7Cc6eefb61c10240de8c0d08dd4759d0d0%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638745173290138741%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XZMKiZvZqGAUmm9tNXbewmoZW%2ByaOXyzGmoA%2BaRWEYE%3D&reserved=0
>>  that is planned and I want to make POC and indeed involve others in 
>> crowd-sourcing the change (similar to provider's move) after I figure out 
>> how to do it.
>> 
>> J.
>> 
>> 
>> On Fri, Feb 7, 2025 at 8:35 AM Blain David <david.bl...@infrabel.be> wrote:
>> 
>>> Hello,
>>> 
>>> 
>>> 
>>> The caplog vote triggered me to launch this proposal as it’s also 
>>> related to unit testing, and as I think we want our unit tests as 
>>> clean and as simple and as fast as possible.
>>> 
>>> I think it would be a good practise to not define and create real 
>>> Airflow connections within the providers unit tests (which use the 
>>> Airflow test database), as normally when writing unit tests those 
>>> should be isolated and not be depend on any external systems like a 
>>> database.
>>> 
>>> 
>>> 
>>> Also in my case those make the tests to run slower.  Beside that I 
>>> ‘ve also noticed when working on some PR regarding providers, 
>>> sometimes there are some glitches within the CI/CD which seem to 
>>> cause issues with those “real” connections, causing tests to randomly fail.
>>> 
>>> Hence that’s why when I do refactorings in provider unit tests, I’ve 
>>> already replaced those real connections with mocked ones making tests 
>>> run faster locally (no database needed) and no more random failures 
>>> during tests (possibly preceding tests that mess up connections).
>>> 
>>> That’s doesn’t mean we don’t want to use the database of course 
>>> during tests, I’m just saying it’s a bit of overkill to use a 
>>> database in a unit test just to get a connection.
>>> 
>>> 
>>> 
>>> We could also create a common mocking method for connections in 
>>> tests_common and use it across all unit tests, now those are mostly 
>>> redefined across different provider tests.
>>> 
>>> 
>>> 
>>> Of course I’m willing to contribute on this one, what do you think 
>>> about this idea?  Personally, I think this can only make maintenance 
>>> easier (and prevent random failures and faster tests results).
>>> 
>>> 
>>> 
>>> Curious of your thoughts.
>>> 
>>> 
>>> 
>>> Kind regards,
>>> 
>>> David
>>> 
>>> 
>>> 
>>> 
>>> 
>>> *David Blain*
>>> 
>>> Data Engineer *at* ICT-514 - BI End User Reporting
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org 
> <mailto:dev-unsubscr...@airflow.apache.org>
> For additional commands, e-mail: dev-h...@airflow.apache.org 
> <mailto:dev-h...@airflow.apache.org>

Reply via email to