Hi Sibarata - I discussed your use case with some of our data engineers, we might have some recommendations on how to better execute this - let us know if you want to chat.
-Ry Airflow Committer + Founder/CTO of Astronomer On Mon, Sep 7, 2020 at 4:30 AM Sibabrata Pattanaik (spattana) <[email protected]> wrote: > Hello Team, > > Currently we are using airflow version - 1.10.10 to data ingest. > > In our DAG, we create tasks dynamically based on data volume , i.e if data > volume is high, number of parallel tasks increases and if the data volume > is less number of parallel tasks reduces in the next run or vice versa. > As DAG execution instance use the same table to update, we use > 'wait_for_downstream' to True to maintain the data consistency and make > sure next run should not happen if the previous run is in progress or > failed. > > In this scenario, we are seeing one issue i.e. If previous instances has > less number of tasks then the current one because of dynamic task creation, > then the current DAG is always in waiting state . As the current DAG is > waiting for the new task/s which are generated during the run but not > exists in the previous DAG instance, but waiting for the same tasks to be > in completion state in the previous DAG. As soon as we manually mark those > tasks as completed in the previous DAG instance, current DAG start running . > > Let me know if you have any work around for this scenario. > > Thanks > Sibabrata Pattanaik > -------------------------- > [email protected] > VOIP 84260416 > +91 80 44260416 > -------------------------- > >
