tosheer commented on issue #38826:
URL: https://github.com/apache/airflow/issues/38826#issuecomment-2045595004

   @uranusjr I agree, adding `catchup` in the mix of dataset will introduce a 
lot of complexity.  For now we can avoid that. But we need to tackle this issue 
at both publishing side and consumption side.
   
   **Publishing Side:**
   - As soon an a DAG is deleted remove entry from 
dag_schedule_dataset_reference so that entries are not added in the 
dataset_dag_run_queue table for any of the dataset events. This is mainly to 
avoid any deleted DAG coming back and getting all the events since it last run.
   
   **Consumption Side:**
   - For any new DAG only give events from the time when it is active and no 
past events. For this filter of dataset event query need to be updated 
https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L1264
 for dag which doesn't have previous runs. 
   
   Above one will fix broader issue of all historical events being available to 
newly created DAG for an historical DATASET present in system from quite some 
time and events being available to DAG which was deleted and came back.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to