RobbertDM opened a new issue, #41305:
URL: https://github.com/apache/airflow/issues/41305

   ### Apache Airflow version
   
   2.9.3
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   I want to use [`DatasetOrTimeSchedule` 
](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timetable.html#dataset-event-based-scheduling-with-time-based-scheduling)
 as a `schedule` in my DAG.
   
   When I do that and wait a day for the schedule to trigger and also the 
dataset trigger from another DAG, I indeed see 2 DAG runs triggered, one 
triggered by the time and one by the dataset (has the dataset icon as well).
   Now, the problem is, the time triggered one (displayed first) is `queued`, 
and the dataset triggered on is `running`, but all the tasks have `no status`. 
They keep being in this state ad infinitum without anything running.
   
   What's also weird is that the time triggered one is queued just one second 
after the dataset triggered one:
   Queued At: `2024-08-07, 02:01:45 CEST`
   And the Dataset triggered one:
   Queued at    `2024-08-07, 02:01:44 CEST`
   Started      `2024-08-07, 02:01:44 CEST`
   
   Which I find weird because it should've been queued at 02:00:00 CEST 
according to its schedule.
   
   I should note I have `max_active_runs=1` and `depends_on_past=True`.
   
   ### What you think should happen instead?
   
   The Dataset trigger should not influence the time schedule. They should run 
independently. The time schedule run should be queued and start running at its 
scheduled time. The tasks should actually run. There should be no deadlock.
   
   ### How to reproduce
   
   From the docs:
   ```
   from airflow.timetables.datasets import DatasetOrTimeSchedule
   from airflow.timetables.trigger import CronTriggerTimetable
   
   
   @dag(
       schedule=DatasetOrTimeSchedule(
           timetable=CronTriggerTimetable("0 1 * * 3", timezone="UTC"), 
datasets=(dag1_dataset | dag2_dataset)
       )
       # Additional arguments here, replace this comment with actual arguments
   )
   def example_dag():
       # DAG tasks go here
       pass
   ```
   
   Set `max_active_tasks=1` and `depends_on_past=True`.
   Have a second DAG trigger this DAG with `dag1_dataset` or `dag2_dataset`
   
   ### Operating System
   
   Our Airflow runs on kubernetes on EKS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   We use conveyor 
https://docs.conveyordata.com/technical-reference/airflow/airflow-installation-details
 
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to