jens-scheffler-bosch commented on PR #30699:
URL: https://github.com/apache/airflow/pull/30699#issuecomment-1513608529

   > Nice small change with a big impact :) I'm not familiar with SQLAlchemy, 
so I'll rephrase my comprehension of your change here to make sure I understand 
well what you're doing (and tell me if I'm wrong): The method 
`next_dagruns_to_examine` claimed to return a List here 
https://github.com/aws-mwaa/upstream-to-airflow/blob/1ebeb19bf7542850fff2f1e2f9795ad70c1b24e2/airflow/models/dagrun.py#L294
 but this type annotation was wrong, as it was returning a query, which was 
lazily ran every time it was iterated. So calling the method once but iterating 
twice on the results resulted in the query being executed twice.
   
   You are right. Annotation is correct though, a list is returned but it is 
lazily evaluated by SQLAlchemy.
   
   > Adding this `.all()` forces evaluation on the spot and actually returns a 
List.
   
   Correct.
   The optimization is in the the usage of 
https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L1326
 whereas in lines 1329+1351 two times an iteration is made over the list. As it 
is lazy evaluated the query is executed two times. But actually the code wants 
to loop two times over the (static) list.
   
   > Did you measure the improvement brought by this PR ? If so, how ? Do you 
have any result to share ?
   
   It is an improvement depending n your DAG queue length and DB query 
performance. Together with/before the other PR we had this query running for 
5-15 seconds times two. Besides (another PR will do this) the query is in some 
cases sub-optimal in our scheduler loop we immediately saved 50% of time in 
this section == 5-15 seconds per scheduler loop.
   But if the queue is not too long, the query will take 50-100ms, then you 
still save 50% of DB efforts :-D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to