[GitHub] [airflow] potiuk edited a comment on pull request #14531: Running tests in parallel for self-hosted runners

GitBox Tue, 02 Mar 2021 18:52:24 -0800


potiuk edited a comment on pull request #14531:
URL: https://github.com/apache/airflow/pull/14531#issuecomment-789386354



   > Have you looked at pytest-xdist?
   
   As I already explained several times - yes, we looked at this several times 
in the past  by different people. First time 2 years ago and we got scared by 
the results. The result were always the same - it failed big time. You can try 
yourself if you want.
   
   Our tests (big number of them)  rely on shared database which is reset and 
reinitialized and filled wth data/cleaned by multiple tests. The shared 
database is shared state resource and if we try to run tests in parallel using 
xdist, they override each other data and fail completely randomly. There is no 
way to run parallel tests sharing the same database (unless we completely 
redefine our tests and always mock the database or somehow isolate all the 
tests that use the DB. 
   
   Particularly quite  lot of tests from the core (including the scheduler ones 
- I guess you are familiar with those) actually use the DB and are not ready to 
be run in parallel. I think you can imagine best what starts happening if you 
run those multiple scheduler tests against the same database in parallel. But 
those are not the only ones. Airflow test suite encourages people to use the DB 
during their testing.
   
   There are also a number of tests that rely on side effects from other tests 
(stored in the same database).  
   
   My solution solves it in the way that every type of tests has its own 
database (and in each "type group" the tests are run sequentially). Then  - 
groups are run in parallel. This way they cannot override each other data and 
since I already run each group separately in the 'sequential' approach, I know 
that side effects are at least 'under control' (i.e. they do not show up).
   
   Of course if we live in ideal world we would have no side effects and no 
shared database, in which case pytest-xdist would work for us. But, 
unfortunately our small Airflow world is not perfect. And fixing it to be 
perfect and ideal could likely take weeks or months of work by isolating, 
fixing and mocking out all the tests. It's a great regret of mine, but I 
believe noone in the community has time for it, so I prefer to implement what 
can give immediate effect and can be implemented with very small effort.
   
   But I do encourage you to take on the task and fix all the tests. That would 
be perfect to have it at some point in time. Also it would likely drive down 
the time needed to run the tests even more.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk edited a comment on pull request #14531: Running tests in parallel for self-hosted runners

Reply via email to