Airflow is best for situations where you want to run different tasks that depend on each other or process data that arrives over time. If your goal is to take a large dataset, split it up, and process chunks of it, there are probably other tools better suited to your purpose.
Off the top of my head, you might consider Dask: https://dask.pydata.org/en/latest/ or directly using Celery: http://www.celeryproject.org/ --George On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.ra...@myntra.com> wrote: > Hi - Can anyone please provide some pointers for this use case over > Airflow? > > -- > Regards, > Ashish > > > > > On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.ra...@myntra.com> > wrote: > > > > Hi, > > > > We have a use case where we are running some R/Python based data science > models, which execute over a single node. The execution time of the models > is constantly increasing and we are now planning to split the model > training by a partition key and distribute the workload over multiple > machines. > > > > Does Airflow provide some simple way to split a task into multiple > tasks, all of which will work on a specific value of the key. > > > > -- > > Regards, > > Ashish > > > > > > > >