potiuk commented on issue #37119:
URL: https://github.com/apache/airflow/issues/37119#issuecomment-1920070412

   No. Same as in the dicussion you quoted - if you want to genereate 
sequential list of tasks at runtime to execute, there is no such feature in 
Airflow - not with the Ariflow definition ot tasks. If you want N-sequentially 
executed, independent task, the number of them and dependencies between them 
(i.e. DAG structure MUST be set at parsing time) . There is no such feature in 
Airflow. One of the reasons is that if you want to dynamically create such 
task, you need to dynamically create dependencies, and that might miean that 
for example your dag will stop being a DAG, it will become dynamically a graph 
with a circle for example. that's why you canmot add dependencies dynamically, 
the whole DAG graph must be resolved before scheduler will start scheduling the 
tasks, because it has to calculate dependencies.
   
   What you can do however (since you do not want to use parallelism feature of 
Airflow and distributing such sequential tasks among different nodes) - you can 
write your own "sequential execution task" that will use EMRC hook and will 
simply exectute your tasks in a loop one by one. And then then loop can be 
arbitrary long and dyanamic.
   
   Im this case you will not get UI visualisation, retries. partial reruns and 
clearing and the like,. But you will get  basic Nx task executed in sequence.
   
   You can also emulate a bit that by assigning 1-slot pool to all tasks in a 
group where  your tasks will be competing for pool slot. But this does not 
guarantee sequence of execution, all your dynamically mapped tasks will be 
still running technically in parallel, but with parallelism = 1 - which means 
one at a time but in an undefined  (random) sequence.
   
   If you also relax your "runtime" expectation to less-than runtime (ie.e. for 
example changing for all runs between the times when you yaml file changes - 
let's say once a day or once a week) then you could generate your DAG from such 
yaml file using Danamic DAG generaiton not Dynamic Tasks Mapping. Where you 
simply create tasks and set dependencies between them in the python code when 
your file is parsed. Roughly:
   
   ```python
   @dag
   def my_dag():
      y = read_yaml().
     for task in y.tasks:
         task = EMr()
         previous_task >>  task
         previous_task = task
   ```
   
   This is absolutely classis way og generating DAGs explained in our docs even 
explicitly showing yaml file: 
https://airflow.apache.org/docs/apache-airflow/stable/howto/dynamic-dag-generation.html#dynamic-dags-with-external-configuration-from-a-structured-data-file
   
   But then, if it changes often and wildely, it's no good use. Such dag should 
change slowly - far less frequent (orders of magnitude)  than the frequency of 
DAG runs.
   
   And yes @nathadfield suggested well that whatever you do (if you will decide 
to use Airflow for this - somewhat niche in Airflow world - case for setting up 
and tearing down your cluster 
   
   
   Those are - I believe all options you have. now in Airflow. 
   
   But as usual, for those who have niche cases - if you figure out a mental 
model when it can be generic enough and implementable, proposals on that are 
welcome -  my feeling is tha that caliber of a use case is somewhat calling for 
Airflow Improvement Proposal to prepare, because (if you want to stick to 
runtime properties) - it calls for a feature that will allow to have subset of 
DAG structure modifications that will not change properties of the graph - for 
example just expanding linear graph branch by injecting new tasks to such graph 
branch.
   
   And BTW. Converting it into discussion. This is not an issue. This is 
discussion on a niche case that you have that is likely not necessary good fit 
for Airflow now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to