jens-scheffler-bosch commented on code in PR #30699:
URL: https://github.com/apache/airflow/pull/30699#discussion_r1194314522
##########
airflow/models/dagrun.py:
##########
@@ -344,7 +344,7 @@ def next_dagruns_to_examine(
return with_row_locks(
query.limit(max_number), of=cls, session=session,
**skip_locked(session=session)
- )
+ ).all()
Review Comment:
I would agree to a local optimization if the `_do_scheduling()` would exit
from the iteration conditionally and such would save not retrieving all()
elements. But it makes also a full walk over the result set. A lazy evaluation
will consume the same memory just a few statements later.
I still propose to keep it here generic because (1) there is no other code
actually suffering from this and (2) the method name rather expects to get a
result and not a lazy evaluated query object. Like in the point we optimize
here (usage in `scheduler_job_runner.py` lines 1299 and 1321) in this proposal
also future changes might run into the same pitfall of lazy evaluation so that
a query is executed by accident multiple times.
I our case w/o DB optimization the query was taking ~15 seconds and was
executed twice, took a lot of debugging to understand why it is fired two times.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]