tosheer commented on issue #38826: URL: https://github.com/apache/airflow/issues/38826#issuecomment-2045595004
@uranusjr I agree, adding `catchup` in the mix of dataset will introduce a lot of complexity. For now we can avoid that. But we need to tackle this issue at both publishing side and consumption side. **Publishing Side:** - As soon an a DAG is deleted remove entry from dag_schedule_dataset_reference so that entries are not added in the dataset_dag_run_queue table for any of the dataset events. This is mainly to avoid any deleted DAG coming back and getting all the events since it last run. **Consumption Side:** - For any new DAG only give events from the time when it is active and no past events. For this filter of dataset event query need to be updated https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L1264 for dag which doesn't have previous runs. Above one will fix broader issue of all historical events being available to newly created DAG for an historical DATASET present in system from quite some time and events being available to DAG which was deleted and came back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
