bhavaniravi opened a new issue #12647:
URL: https://github.com/apache/airflow/issues/12647
<!--
Welcome to Apache Airflow! For a smooth issue process, try to answer the
following questions.
Don't worry if they're not all applicable; just try to include what you can
:-)
If you need to include code snippets or logs, please put them in fenced code
blocks. If they're super-long, please use the details tag like
<details><summary>super-long log</summary> lots of stuff </details>
Please delete these comment blocks before submitting the issue.
-->
**Description**
Supporting Runtime-Dynamic workflows using multiple task instance during DAG
run
**Use case / motivation**
One of the most common requirements that I constantly see repeating in
multiple places and forums is to create dynamic task B based on the output of
task A.
```
|---> Task B.1 -- |
|---> Task B.2 -- |
Task A --- |---> Task B.3 -- |-----> Task C
| .... |
|---> Task B.N --|
```
A specific requirement in the above case is to parallelize data processing
while the specification of the task remains the same. Let's the requirement is
to
1. Fetch the n records from the data lake
2. Process `n` records in task B `preprocess_records`
3. Spin up multiple B so that we can parallelize the processing of N records
*Inspiration from Argo* :: https://argoproj.github.io/argo/examples/#loops
**Idea**
With current airflow, there is only one task instance per task during a DAG
run.
How about we provide an API where `Task A` can inject the number of task
instances it can spin up for the downstream task.
Should this be a separate operator?
**Related Issues**
https://stackoverflow.com/questions/41517798/proper-way-to-create-dynamic-workflows-in-airflow
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]