Yes, I believe they are used for splitting a bigger DAG into smaller DAGs, for clarity and reusability. In our use case, we need to split/replicate a specific task into multiple tasks, based on the different values of a key, essentially data partitioning and processing.
-- Regards, Ashish > On 09-Aug-2017, at 10:49 PM, Van Klaveren, Brian N. <[email protected]> > wrote: > > Have you looked into subdags? > > Brian > > >> On Aug 9, 2017, at 10:16 AM, Ashish Rawat <[email protected]> wrote: >> >> Thanks George. Our use case also periodic scheduling (daily), as well as >> task dependencies, so we chose Airflow for this use case. However, some of >> the tasks in a DAG have now become too big to execute over one node, we want >> to split them into multiple task to reduce execution time. Would you >> recommend firing parts of an Airflow DAG in another framework? >> >> -- >> Regards, >> Ashish >> >> >> >>> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman >>> <[email protected]> wrote: >>> >>> Airflow is best for situations where you want to run different tasks that >>> depend on each other or process data that arrives over time. If your goal >>> is to take a large dataset, split it up, and process chunks of it, there >>> are probably other tools better suited to your purpose. >>> >>> Off the top of my head, you might consider Dask: >>> https://dask.pydata.org/en/latest/ or directly using Celery: >>> http://www.celeryproject.org/ >>> >>> --George >>> >>> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <[email protected]> wrote: >>> >>>> Hi - Can anyone please provide some pointers for this use case over >>>> Airflow? >>>> >>>> -- >>>> Regards, >>>> Ashish >>>> >>>> >>>> >>>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <[email protected]> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> We have a use case where we are running some R/Python based data science >>>> models, which execute over a single node. The execution time of the models >>>> is constantly increasing and we are now planning to split the model >>>> training by a partition key and distribute the workload over multiple >>>> machines. >>>>> >>>>> Does Airflow provide some simple way to split a task into multiple >>>> tasks, all of which will work on a specific value of the key. >>>>> >>>>> -- >>>>> Regards, >>>>> Ashish >>>>> >>>>> >>>>> >>>> >>>> >> >
