mik-laj commented on a change in pull request #6792: [AIRFLOW-5930] Use 
cached-SQL query building for hot-path queries
URL: https://github.com/apache/airflow/pull/6792#discussion_r356991452
 
 

 ##########
 File path: airflow/jobs/scheduler_job.py
 ##########
 @@ -1057,30 +1027,34 @@ def _find_executable_task_instances(self, 
simple_dag_bag, states, session=None):
         TI = models.TaskInstance
         DR = models.DagRun
         DM = models.DagModel
-        ti_query = (
-            session
-            .query(TI)
-            .filter(TI.dag_id.in_(simple_dag_bag.dag_ids))
+        ti_query = BAKED_QUERIES(
+            lambda session: session.query(TI).filter(
+                TI.dag_id.in_(simple_dag_bag.dag_ids)
+            )
             .outerjoin(
                 DR,
                 and_(DR.dag_id == TI.dag_id, DR.execution_date == 
TI.execution_date)
             )
-            .filter(or_(DR.run_id == None,  # noqa: E711 pylint: 
disable=singleton-comparison
-                    not_(DR.run_id.like(BackfillJob.ID_PREFIX + '%'))))
+            .filter(or_(DR.run_id.is_(None),
+                        not_(DR.run_id.like(BackfillJob.ID_PREFIX + '%'))))
 
 Review comment:
   I really don't like filtering with the like expression. This makes the query 
very difficult to optimize. It is not possible to store it in a simple data 
structure. We have to have a very complex binary tree, but which takes more 
memory than a simple structure with 3 values. Which causes other problems, e.g. 
unbalanced tree, and thus performance degradation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to