[GitHub] [airflow] potiuk commented on issue #33178: Flaky `test_xcom_map_error_fails_task` test

via GitHub Tue, 08 Aug 2023 00:58:37 -0700


potiuk commented on issue #33178:
URL: https://github.com/apache/airflow/issues/33178#issuecomment-1669109395


   I have several hypotheses around that.
   
   1) `run()` should not be used in tests
   
   I had `run()` is wrong hypothesis (that you always need to set 
session=session when you dag_run.run() or ti.run() ). 
   But I looked closer and @provide_session decorator already closes the 
session when it exits, so that should not be the case (unless of course those 
run method has some bug in them that session is not really passed down but 
re-created again).
   
   So I think now that changing everywhere  `run(session=session)` is not a 
solution 
   
   2) It's possible that other tests are leaking open sessions as you explained
   
   However I find it pretty strange that those test failures are occuring only 
in the "mapped" versions of tests. There is - I believe - an easy way to verify 
this hypothesis. I just created this PR #33190 with "full tests needed" - where 
I added an `autouse=True` fixture which will close all open sqlalchemy sessions 
before AND after every test function. This should handle all cases and even 
cases where pytest will generally run each tests in a different thread - pytest 
fixture is guaranteed to run in one test, before the other test is even 
attempted - so even if there are other threads around, this should handle it.
   
   I **believe** this is even something that we can geneally leave "for good" 
in our `conftest.py`. I don't think it will slow down our test suite that much 
(it will be maybe 10s of seconds rather than minutes) - but it should give use 
perfect isolation in case there are in-fact any sessions leaked from other 
tests. Let's see if it will change anything.
   
   However, when we exclude that (let's see if we can see the flaky tests with 
the "close_all_sessions". then whatever is left must be the reason.
   
   3) My 3rd hypothesis is that there is a bug somewhere deep in the mapped 
code that causes it. 
   
   The flaky tests are only mapping related. I run probably few 100s (if not 
thousands) of test jobs recently in my quest to squash the flaky tests and 
those errors only occur in tests which are related to mapped task instances. 
Not even once I saw it in other tests. And I saw about 30-40 of such failuers 
already. 
   
   For me this is likely that somewhere deep-down the stack there is one 
forgotten place where we do not pass session but create a new one and they are 
used together. We just need to find that place. But let's see after I run the 
"close_all_sessions" change quite a few times and let's see if we can see it 
happening there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #33178: Flaky `test_xcom_map_error_fails_task` test

Reply via email to