jens-scheffler-bosch commented on PR #30699: URL: https://github.com/apache/airflow/pull/30699#issuecomment-1513608529
> Nice small change with a big impact :) I'm not familiar with SQLAlchemy, so I'll rephrase my comprehension of your change here to make sure I understand well what you're doing (and tell me if I'm wrong): The method `next_dagruns_to_examine` claimed to return a List here https://github.com/aws-mwaa/upstream-to-airflow/blob/1ebeb19bf7542850fff2f1e2f9795ad70c1b24e2/airflow/models/dagrun.py#L294 but this type annotation was wrong, as it was returning a query, which was lazily ran every time it was iterated. So calling the method once but iterating twice on the results resulted in the query being executed twice. You are right. Annotation is correct though, a list is returned but it is lazily evaluated by SQLAlchemy. > Adding this `.all()` forces evaluation on the spot and actually returns a List. Correct. The optimization is in the the usage of https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L1326 whereas in lines 1329+1351 two times an iteration is made over the list. As it is lazy evaluated the query is executed two times. But actually the code wants to loop two times over the (static) list. > Did you measure the improvement brought by this PR ? If so, how ? Do you have any result to share ? It is an improvement depending n your DAG queue length and DB query performance. Together with/before the other PR we had this query running for 5-15 seconds times two. Besides (another PR will do this) the query is in some cases sub-optimal in our scheduler loop we immediately saved 50% of time in this section == 5-15 seconds per scheduler loop. But if the queue is not too long, the query will take 50-100ms, then you still save 50% of DB efforts :-D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
