tosheer opened a new issue, #38826:
URL: https://github.com/apache/airflow/issues/38826

   ### Apache Airflow version
   
   2.9.0b2
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   * If a new dataset triggreed DAG is created for an already existing dataset. 
(Dataset has already existing dataset events)
       * DAG catch up false.
           * DAG see all dataset events from very first event for dataset.
       * DAG catch up true.
           * DAG see all dataset events from very first event for dataset.
   * If a dataset triggreed DAG is disabled and then enabled back. (In b/w 
disable and enable of the DAG dataset events)
       * DAG catch up false.
           * DAG see all dataset events received during the timeframe when DAG 
was disabled.
       * DAG catch up true.
           * DAG see all dataset events received during the timeframe when DAG 
was disabled.
   * If a dataset triggreed DAG is deleted and then recreated back keeping 
dag_id same. (In b/w delete and recreate of the DAG dataset events)
       * DAG catch up false.
           * DAG see all dataset events received during the timeframe when DAG 
was not enabled.
       * DAG catch up true.
           * DAG see all dataset events received during the timeframe when DAG 
was deleted.
   
   ### What you think should happen instead?
   
   * If a new dataset triggreed DAG is created for an already existing dataset. 
(Dataset has already existing dataset events)
       * DAG catch up false.
           * Expected behaviour: DAG should see all the dataset events from the 
time when DAG was enabled.
           * Actual Behaviour: DAG see all dataset events from very first event 
for dataset.
       * DAG catch up true.
           * Expected behaviour: DAG should see all the dataset events from the 
time when DAG was created.
           * Actual Behaviour: DAG see all dataset events from very first event 
for dataset.
   * If a dataset triggreed DAG is disabled and then enabled back. (In b/w 
disable and enable of the DAG dataset events)
       * DAG catch up false.
           * Expected behaviour: DAG should see all the dataset events from the 
time when DAG was enabled back.
           * Actual Behaviour: DAG see all dataset events received during the 
timeframe when DAG was disabled.
       * DAG catch up true.
           * Expected behaviour: DAG see all dataset events received during the 
timeframe when DAG was disabled.
           * Actual Behaviour: DAG see all dataset events received during the 
timeframe when DAG was disabled.
   * If a dataset triggreed DAG is deleted and then recreated back keeping 
dag_id same. (In b/w delete and recreate of the DAG dataset events)
       * DAG catch up false.
           * Expected behaviour: DAG should see all the dataset events from the 
time when DAG was enabled and not all the events received during the timeframe 
when DAG was deleted or recreated and not enabled.
           * Actual Behaviour: DAG see all dataset events received during the 
timeframe when DAG was not enabled.
       * DAG catch up true.
           * Expected behaviour: DAG should see all the dataset events from the 
time when DAG was recreated and not all the events received during the 
timeframe when DAG was deleted.
           * Actual Behaviour: DAG see all dataset events received during the 
timeframe when DAG was deleted.
   
   ### How to reproduce
   
   * Create a new DAG which has dataset event producer. Enable both the 
producer.
   * Run producer dag multiple times.
   * Create a new consumer DAG which also get triggered from same dataset with 
catchup false.
   * Enable the new DAG.
   * You will see new DAG will run as soon as enabled and see very first event 
in the dataset.
   * In the next run whenever that was scheduled DAG will get all the events 
other than the first event it has already consumed.
   
   ```from __future__ import annotations
   
   import pendulum
   
   from airflow.datasets import Dataset
   from airflow.models.dag import DAG
   from airflow.operators.bash import BashOperator
   
   # [START dataset_def]
   poc_dag1_dataset = Dataset("test-cluster/test-schema/test-table")
   poc_dag2_dataset = Dataset("test-cluster/test-schema/test-table2")
   poc_dataset_consumes_check = 
Dataset("test-cluster/test-schema/test-table-check")
   poc_dataset_consumes_check2 = 
Dataset("test-cluster/test-schema/test-table-check2")
   
   
   def _write2_post_execute(context, _):
       context["dataset_events"]["test-cluster/test-schema/test-table"].extra = 
{"x_tk": 1}
   
   
   with DAG(
       dag_id="poc_dataset_produces_1",
       catchup=False,
       start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
       schedule="@daily",
       tags=["produces", "dataset-scheduled"],
   ) as dag1:
       # [START task_outlet]
       BashOperator(
           outlets=[poc_dag1_dataset], 
           task_id="producing_task_1", 
           bash_command="sleep 5",
           post_execute=_write2_post_execute,
       )
       # [END task_outlet]
   ```
   
   After this add  
   ```
   # [START dag_dep]
   with DAG(
       dag_id="poc_dataset_consumes_1",
       catchup=False,
       start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
       schedule=[poc_dag1_dataset],
       tags=["consumes", "dataset-scheduled"],
   ) as dag3:
       # [END dag_dep]
       BashOperator(
           task_id="consuming_1",
           bash_command='echo "ti_key={{ triggering_dataset_events }}"',
       )
   ```
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Plain vanilla airflow.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to