Airflow is best for situations where you want to run different tasks that
depend on each other or process data that arrives over time. If your goal
is to take a large dataset, split it up, and process chunks of it, there
are probably other tools better suited to your purpose.

Off the top of my head, you might consider Dask:
https://dask.pydata.org/en/latest/ or directly using Celery:
http://www.celeryproject.org/

--George

On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.ra...@myntra.com> wrote:

> Hi - Can anyone please provide some pointers for this use case over
> Airflow?
>
> --
> Regards,
> Ashish
>
>
>
> > On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.ra...@myntra.com>
> wrote:
> >
> > Hi,
> >
> > We have a use case where we are running some R/Python based data science
> models, which execute over a single node. The execution time of the models
> is constantly increasing and we are now planning to split the model
> training by a partition key and distribute the workload over multiple
> machines.
> >
> > Does Airflow provide some simple way to split a task into multiple
> tasks, all of which will work on a specific value of the key.
> >
> > --
> > Regards,
> > Ashish
> >
> >
> >
>
>

Reply via email to