potiuk commented on issue #33178: URL: https://github.com/apache/airflow/issues/33178#issuecomment-1669109395
I have several hypotheses around that. 1) `run()` should not be used in tests I had `run()` is wrong hypothesis (that you always need to set session=session when you dag_run.run() or ti.run() ). But I looked closer and @provide_session decorator already closes the session when it exits, so that should not be the case (unless of course those run method has some bug in them that session is not really passed down but re-created again). So I think now that changing everywhere `run(session=session)` is not a solution 2) It's possible that other tests are leaking open sessions as you explained However I find it pretty strange that those test failures are occuring only in the "mapped" versions of tests. There is - I believe - an easy way to verify this hypothesis. I just created this PR #33190 with "full tests needed" - where I added an `autouse=True` fixture which will close all open sqlalchemy sessions before AND after every test function. This should handle all cases and even cases where pytest will generally run each tests in a different thread - pytest fixture is guaranteed to run in one test, before the other test is even attempted - so even if there are other threads around, this should handle it. I **believe** this is even something that we can geneally leave "for good" in our `conftest.py`. I don't think it will slow down our test suite that much (it will be maybe 10s of seconds rather than minutes) - but it should give use perfect isolation in case there are in-fact any sessions leaked from other tests. Let's see if it will change anything. However, when we exclude that (let's see if we can see the flaky tests with the "close_all_sessions". then whatever is left must be the reason. 3) My 3rd hypothesis is that there is a bug somewhere deep in the mapped code that causes it. The flaky tests are only mapping related. I run probably few 100s (if not thousands) of test jobs recently in my quest to squash the flaky tests and those errors only occur in tests which are related to mapped task instances. Not even once I saw it in other tests. And I saw about 30-40 of such failuers already. For me this is likely that somewhere deep-down the stack there is one forgotten place where we do not pass session but create a new one and they are used together. We just need to find that place. But let's see after I run the "close_all_sessions" change quite a few times and let's see if we can see it happening there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
