michaelmicheal commented on code in PR #29441:
URL: https://github.com/apache/airflow/pull/29441#discussion_r1102819882


##########
airflow/www/views.py:
##########
@@ -3715,7 +3715,6 @@ def next_run_datasets(self, dag_id):
                     DatasetEvent,
                     and_(
                         DatasetEvent.dataset_id == DatasetModel.id,
-                        DatasetEvent.timestamp > DatasetDagRunQueue.created_at,

Review Comment:
   > However, I would also remove the and_ around it since then there would 
only be one filter condition in that join:
   
   Yes, you're right the `and` becomes unnecessary. 
   
   I think there might be some confusion around DDRQ. My understanding is that 
when a `DatasetEvent` is created, a DDRQ record is created per consuming DAG. 
Then, once a DAG has an associated DDRQ record for each `Dataset` that it 
depends on, a dag_run is created and then all DDRQ records associated with that 
DAG are deleted. 
   
   > If you go for option 2, I think you should be able to compare the 
existence and creation time of the DDRQ with the DatasetEvent timestamp to 
figure out whether or not the last update time has already triggered a 
DDRQ/DagRun or if it has partially satisfied the conditions of a future DagRun.
   
   As I understand it, if there are DDRQ records for a DAG,  we can assume that 
there hasn't been a DagRun triggered since the last DatasetEvent (because we 
delete DDRQ records on the creation of a DagRun).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to