potiuk commented on PR #33309: URL: https://github.com/apache/airflow/pull/33309#issuecomment-1674812225
Hey Herel. While doing my "flaky test squashing" quest, and investigating the flaky "session" test #33178 I deduced that there must be something deep in the mapped operator implementation that causes sessions leak during the tests, that might actually be also causing similar problems in production. It turned out that the problems are (almost for sure) caused by the way we retrieved (and not closed/discarded) DB session while rendering templates in mapped tasks. I discussed it with @uranusjr and while doing some experiments (mainly in "Chasing leaking sessions" PR https://github.com/apache/airflow/pull/33259 I realized that it has an unintended side effect that we are opening more ssessions that necessary when runnning mapped task - basically we always open one session for the "_run_raw_task" and then another session just to "render templates". There is a good reason for doing it (we cannot easily pass sesssion through usual @provide_session mechanism because "render_template" method's signature need to be compatible with non-mapped object, but the way we've done that was pretty reasource-heavy and brittle. It opened a new session and basically it had not closed the session, because of recursive behaviour it was quite possible the session could be close too early. In the discussion with @uranusjr https://github.com/apache/airflow/issues/33178#issuecomment-1674206630 it turned out that this method is only actually called in the context of `_run_raw_task` - so I figured that we can do it much more efficiently and without the "brittleness" - i.e. we could have a context manager that will store the "task" session in the global data and then render_task_template of mapped operator could use that globally stored session rather than create a new one. I think it's much safer/stable and less resource hungry (just single session needed for the mapped task, not two) - but also far more robust (the context manager will close the session when it leaves the _run_raw_task. It's also is going to solve the test flakiness I think (i will close /reopen this PR few times to see if the problem is gone). I cannot see any bad side-effects of the change - I already fixed all the tests that need the context manager and added the context manager to the CLI method that also uses render_templates (the only place besides "_run_raw_task" that apparently uses rendering). I'd love those who have a bit more experience in those parts (@ashb @uranusjr) to take a look and see if I have not missed anything - this one might be a good idea for 2.7.1 (or maybe even we could add it to rc2 of 2.7.0 if we have one). Looking forward to comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
