potiuk commented on PR #33309:
URL: https://github.com/apache/airflow/pull/33309#issuecomment-1674812225

   Hey Herel. While doing my "flaky test squashing" quest, and investigating 
the flaky "session" test #33178 I deduced that there must be something deep in 
the mapped operator implementation that causes sessions leak during the tests, 
that might actually be also causing similar problems in production.
   
   It turned out that the problems are (almost for sure) caused by the way we 
retrieved (and not closed/discarded) DB session while rendering templates in 
mapped tasks. I discussed it with @uranusjr and while doing some experiments 
(mainly in "Chasing leaking sessions" PR 
https://github.com/apache/airflow/pull/33259 I realized that it has an 
unintended side effect that we are opening more ssessions that necessary when 
runnning mapped task - basically we always open one session for the 
"_run_raw_task" and then another session just to "render templates". 
   
   There is a good reason for doing it (we cannot easily pass sesssion through 
usual @provide_session mechanism because "render_template" method's signature 
need to be compatible with non-mapped object, but the way we've done that was 
pretty reasource-heavy and brittle. It opened a new session and basically it 
had not closed the session, because of recursive behaviour it was quite 
possible the session could be close too early. 
   
   In the discussion with @uranusjr 
https://github.com/apache/airflow/issues/33178#issuecomment-1674206630  it 
turned out that this method is only actually called in the context of 
`_run_raw_task` - so I figured that we can do it much more efficiently and 
without the "brittleness" - i.e. we could have a context manager that will 
store the "task" session in the global data and then render_task_template of 
mapped operator could use that globally stored session rather than create a new 
one.
   
   I think it's much safer/stable and less resource hungry (just single session 
needed for the mapped task, not two) - but also far more robust (the context 
manager will close the session when it leaves the _run_raw_task. It's also is 
going to solve the test flakiness I think (i will close /reopen this PR few 
times to see if the problem is gone).
   
   I cannot see any bad side-effects of the change - I already fixed all the 
tests that need the context manager and added the context manager to the CLI 
method that also uses render_templates (the only place besides "_run_raw_task" 
that apparently uses rendering). 
   
   I'd love those who have a bit more experience in those parts (@ashb 
@uranusjr) to take a look and see if I have not missed anything - this one 
might be a good idea for 2.7.1 (or maybe even we could add it to rc2 of 2.7.0 
if we have one).
   
   Looking forward to comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to