potiuk commented on pull request #14531: URL: https://github.com/apache/airflow/pull/14531#issuecomment-789386354
> Have you looked at pytest-xdist? As I already explained several times - yes, we looked at this several times in the past by different people. First time 2 years ago and we got scared by the results. The result were always the same - it failed big time. You can try yourself if you want. Our tests (big number of them) rely on shared database which is reset and reinitialized and filled wth data/cleaned by multiple tests. The shared database is shared state resource and if we try to run tests in parallel using xdist, they override each other data and fail completely randomly. There is no way to run parallel tests sharing the same database (unless we completely redefine our tests and always mock the database or somehow isolate all the tests that use the DB. Particularly quite lot of tests from the core (including the scheduler ones - I guess you are familiar with those) actually use the DB and are not ready to be run in parallel. I think you can imagine best what starts happening if you run those multiple scheduler tests against the same database in parallel. But those are not the only ones. Airflow test suite encourages people to use the DB during their testing. There are also a number of tests that rely on side effects from other tests (stored in the same database). My solution solves it in the way that every type of tests has its own database (and each group tests are run sequentially). This way they cannot override each other data and since I already run each group separately in the 'sequential' approach, I know that side effects are at least 'under control' (i.e. they do not show up). Of course if we leave in ideal world we would have no side effects and no shared database, in which case pytest-xdist would work for us. But, unfortunately our small Airlfow world is not perfect. And fixing it to be perfect and ideal could likely take weeks or months of work by isolating, fixing and mocking out all the tests. It's a great regret of mine, but I believe noone in the community has time for it, so I prefer to implement what can give immediate effect and can be implemented with very small effort. But I do encourage you to take on the task and fix all the tests. That would be perfect to have it at some point in time. Also it would likely drive down the time needed to run the tests even more. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
