ashb commented on a change in pull request #4751: [AIRFLOW-3607] collected 
trigger rule dep check per dag run
URL: https://github.com/apache/airflow/pull/4751#discussion_r346970934
 
 

 ##########
 File path: airflow/jobs/scheduler_job.py
 ##########
 @@ -717,7 +718,10 @@ def _process_task_instances(self, dag, 
task_instances_list, session=None):
             run.dag = dag
             # todo: preferably the integrity check happens at dag collection 
time
             run.verify_integrity(session=session)
-            run.update_state(session=session)
+            finished_tasks = run.get_task_instances(state=State.finished() + 
[State.UPSTREAM_FAILED],
+                                                    session=session)
 
 Review comment:
   This works, but it asks for a lot more columns and rows than we need.
   
   We could try changing the return inside this function from `return 
tis.all()` to just `return tis`, and this line could become:
   
   ```python
               finished_tasks = run.get_task_instances(state=State.finished() + 
[State.UPSTREAM_FAILED],
                                                       
session=session).options(load_only("task_id", "state"))
   ```
   
   
https://docs.sqlalchemy.org/en/13/orm/loading_columns.html#load-only-and-wildcard-options
   
   Do you think this is worth it or not worth it?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to