I will give it a try, thanks Brian!! -- Regards, Ashish
> On 09-Aug-2017, at 11:02 PM, Van Klaveren, Brian N. <b...@slac.stanford.edu> > wrote: > > Hi Ashish, > > Partitioned tasks might be able to be modeled as triggered, n-many > parameterized dags/subdags (where the parameter is the partition key). I've > used this pattern in the past a lot in other systems, but not specifically > with airflow, so I'm not sure how you'd necessarily implement it with > Airflow, but hoping this maybe gives you some ideas. > > Brian > > >> On Aug 9, 2017, at 10:23 AM, Ashish Rawat <ashish.ra...@myntra.com> wrote: >> >> Yes, I believe they are used for splitting a bigger DAG into smaller DAGs, >> for clarity and reusability. In our use case, we need to split/replicate a >> specific task into multiple tasks, based on the different values of a key, >> essentially data partitioning and processing. >> >> -- >> Regards, >> Ashish >> >> >> >>> On 09-Aug-2017, at 10:49 PM, Van Klaveren, Brian N. >>> <b...@slac.stanford.edu> wrote: >>> >>> Have you looked into subdags? >>> >>> Brian >>> >>> >>>> On Aug 9, 2017, at 10:16 AM, Ashish Rawat <ashish.ra...@myntra.com> wrote: >>>> >>>> Thanks George. Our use case also periodic scheduling (daily), as well as >>>> task dependencies, so we chose Airflow for this use case. However, some of >>>> the tasks in a DAG have now become too big to execute over one node, we >>>> want to split them into multiple task to reduce execution time. Would you >>>> recommend firing parts of an Airflow DAG in another framework? >>>> >>>> -- >>>> Regards, >>>> Ashish >>>> >>>> >>>> >>>>> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman >>>>> <geo...@cloverhealth.com.INVALID> wrote: >>>>> >>>>> Airflow is best for situations where you want to run different tasks that >>>>> depend on each other or process data that arrives over time. If your goal >>>>> is to take a large dataset, split it up, and process chunks of it, there >>>>> are probably other tools better suited to your purpose. >>>>> >>>>> Off the top of my head, you might consider Dask: >>>>> https://dask.pydata.org/en/latest/ or directly using Celery: >>>>> http://www.celeryproject.org/ >>>>> >>>>> --George >>>>> >>>>> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.ra...@myntra.com> >>>>> wrote: >>>>> >>>>>> Hi - Can anyone please provide some pointers for this use case over >>>>>> Airflow? >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Ashish >>>>>> >>>>>> >>>>>> >>>>>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.ra...@myntra.com> >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We have a use case where we are running some R/Python based data science >>>>>> models, which execute over a single node. The execution time of the >>>>>> models >>>>>> is constantly increasing and we are now planning to split the model >>>>>> training by a partition key and distribute the workload over multiple >>>>>> machines. >>>>>>> >>>>>>> Does Airflow provide some simple way to split a task into multiple >>>>>> tasks, all of which will work on a specific value of the key. >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Ashish >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>> >> >