Have you looked into subdags? Brian
> On Aug 9, 2017, at 10:16 AM, Ashish Rawat <[email protected]> wrote: > > Thanks George. Our use case also periodic scheduling (daily), as well as task > dependencies, so we chose Airflow for this use case. However, some of the > tasks in a DAG have now become too big to execute over one node, we want to > split them into multiple task to reduce execution time. Would you recommend > firing parts of an Airflow DAG in another framework? > > -- > Regards, > Ashish > > > >> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman >> <[email protected]> wrote: >> >> Airflow is best for situations where you want to run different tasks that >> depend on each other or process data that arrives over time. If your goal >> is to take a large dataset, split it up, and process chunks of it, there >> are probably other tools better suited to your purpose. >> >> Off the top of my head, you might consider Dask: >> https://dask.pydata.org/en/latest/ or directly using Celery: >> http://www.celeryproject.org/ >> >> --George >> >> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <[email protected]> wrote: >> >>> Hi - Can anyone please provide some pointers for this use case over >>> Airflow? >>> >>> -- >>> Regards, >>> Ashish >>> >>> >>> >>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <[email protected]> >>> wrote: >>>> >>>> Hi, >>>> >>>> We have a use case where we are running some R/Python based data science >>> models, which execute over a single node. The execution time of the models >>> is constantly increasing and we are now planning to split the model >>> training by a partition key and distribute the workload over multiple >>> machines. >>>> >>>> Does Airflow provide some simple way to split a task into multiple >>> tasks, all of which will work on a specific value of the key. >>>> >>>> -- >>>> Regards, >>>> Ashish >>>> >>>> >>>> >>> >>> >
