Re: [PR] Improve testing harness to separate DB and non-DB tests [airflow]

via GitHub Mon, 30 Oct 2023 09:28:02 -0700


potiuk commented on code in PR #35160:
URL: https://github.com/apache/airflow/pull/35160#discussion_r1376495329



##########
tests/operators/test_python.py:
##########
@@ -844,17 +848,30 @@ def f(exit_code):
             assert ti.state == expected_state
 
 
+venv_cache_path = tempfile.mkdtemp(prefix="venv_cache_path")

Review Comment:
   Yeah. Isolation is a key.
   
   And we cannot get isolation when some (most of the DB ones) of our tests do 
something like this:
   
   ```python
       @pytest.fixture(autouse=True)
       def base_tests_setup(self, request, create_task_instance_of_operator, 
dag_maker):
           self.dag_id = f"dag_{slugify(request.cls.__name__)}"
           self.task_id = f"task_{slugify(request.node.name, max_length=40)}"
           self.run_id = f"run_{slugify(request.node.name, max_length=40)}"
           self.ds_templated = self.default_date.date().isoformat()
           self.ti_maker = create_task_instance_of_operator
           self.dag_maker = dag_maker
           self.dag = self.dag_maker(self.dag_id, 
template_searchpath=TEMPLATE_SEARCHPATH).dag
           clear_db_runs()
           yield
           clear_db_runs()
   ```
   
   This is actual code from `BasePythonTest` which is base for all the Python 
tests. if you see at the last three lines  ... all dag_runs will be cleared 
alwasys before and after every test. This is the implementation:
   
   ```python
   def clear_db_runs():
       with create_session() as session:
           session.query(Job).delete()
           session.query(Trigger).delete()
           session.query(DagRun).delete()
           session.query(TaskInstance).delete()
   ```
   
   This is of course key for the test to have proper setup/teardown but also it 
means that if it is run via `pytest-xdist` it wil randomly delete all the 
dag_runs while other tests are using the same database... (and likely most of 
the 7000 other DB tests is doing something very similar to achieve 
reproducibility).
   
   Not good. Not good at all. 
   
   And even if we could  fix it in some tests, there is no way we prevent 
someone sneaking in a fixture or setup_method with 
`session.query(DagRun).delete()` in order to "clean-up" before and after the 
test (which is good idea as long as it does not impact other tests running in 
paralllel) . Not when we merge 30 PRs a day. And one faulty test can start 
randomly break any other tests, chasing such a "bad apple" would be a nightmare.
   
   That's why I think there are three reasonable approaches:
   
   1) Split your DB tests in serialized test groups where tests are executed 
sequentially in a group and each group has their own DB. This is what we have 
now. In our self-hosted runners in CI we have up to 8 (this is how many CPUS we 
have) complete docker compose instances running - each of them having a 
separate DB and each of them running sequentially a subset of tests 
(TEST_TYPE). In Public Runners we have up to 2 of those (because we only have 2 
CPUS). This powers our tests for years now and saves countless hours of build 
time utilising parallelism of those instances -  I implemented it long time ago 
:) 
   
   2) separate out tests that do not use DB (this is what this PR aims for) and 
run on them with xdist and to add mechanism to keep it working for the future - 
detecting running DB tests in `xdist` environment. And good thing is that we 
can "graduate" some tests from DB to non-DB easily after this PR gets merged 
(just make them not use DB and remove the marker), This is what I refer to 7000 
-> 1500 DB tests. That should be our goal IMHO (roughly - I made that 1500 
number up of course, just thinking 10% of our tests should be DB tests)
   
   3) Give each DB test, their own DB. That's probably far too costly as an 
overhead (especially that with DB tests we want to - likely test all supported 
DBs). I attempted once to expand the approach from 1) and have a separate DB 
"per test module" rather than "per test group" - so instead of 12 or so 
TEST_TYPES we could have now 200 or so separately started docker-compose 
instances with DB and CI containers (up to 8 at a time of course)- but the 
overhead for starting/stoping images and DBs was so huge that the tests took 
way more than now to complete (at least 5 times more I think)
   
   So I believe 1) is a good middle-ground between parallelism and isolation 
and when we get far less number of DB tests by followng 2) route and moving 
more tests to be non-DB, we will get "optimal" solution. I think chasing "let's 
get the DB tests run with xdist` might not achieve the goal due to 
overhead/complexity/flakiness , 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Improve testing harness to separate DB and non-DB tests [airflow]

Reply via email to