I will give it a try, thanks Brian!!

--
Regards,
Ashish



> On 09-Aug-2017, at 11:02 PM, Van Klaveren, Brian N. <b...@slac.stanford.edu> 
> wrote:
> 
> Hi Ashish,
> 
> Partitioned tasks might be able to be modeled as triggered, n-many 
> parameterized dags/subdags (where the parameter is the partition key). I've 
> used this pattern in the past a lot in other systems, but not specifically 
> with airflow, so I'm not sure how you'd necessarily implement it with 
> Airflow, but hoping this maybe gives you some ideas.
> 
> Brian
> 
> 
>> On Aug 9, 2017, at 10:23 AM, Ashish Rawat <ashish.ra...@myntra.com> wrote:
>> 
>> Yes, I believe they are used for splitting a bigger DAG into smaller DAGs, 
>> for clarity and reusability. In our use case, we need to split/replicate a 
>> specific task into multiple tasks, based on the different values of a key, 
>> essentially data partitioning and processing.
>> 
>> --
>> Regards,
>> Ashish
>> 
>> 
>> 
>>> On 09-Aug-2017, at 10:49 PM, Van Klaveren, Brian N. 
>>> <b...@slac.stanford.edu> wrote:
>>> 
>>> Have you looked into subdags?
>>> 
>>> Brian
>>> 
>>> 
>>>> On Aug 9, 2017, at 10:16 AM, Ashish Rawat <ashish.ra...@myntra.com> wrote:
>>>> 
>>>> Thanks George. Our use case also periodic scheduling (daily), as well as 
>>>> task dependencies, so we chose Airflow for this use case. However, some of 
>>>> the tasks in a DAG have now become too big to execute over one node, we 
>>>> want to split them into multiple task to reduce execution time. Would you 
>>>> recommend firing parts of an Airflow DAG in another framework?
>>>> 
>>>> --
>>>> Regards,
>>>> Ashish
>>>> 
>>>> 
>>>> 
>>>>> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman 
>>>>> <geo...@cloverhealth.com.INVALID> wrote:
>>>>> 
>>>>> Airflow is best for situations where you want to run different tasks that
>>>>> depend on each other or process data that arrives over time. If your goal
>>>>> is to take a large dataset, split it up, and process chunks of it, there
>>>>> are probably other tools better suited to your purpose.
>>>>> 
>>>>> Off the top of my head, you might consider Dask:
>>>>> https://dask.pydata.org/en/latest/ or directly using Celery:
>>>>> http://www.celeryproject.org/
>>>>> 
>>>>> --George
>>>>> 
>>>>> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.ra...@myntra.com> 
>>>>> wrote:
>>>>> 
>>>>>> Hi - Can anyone please provide some pointers for this use case over
>>>>>> Airflow?
>>>>>> 
>>>>>> --
>>>>>> Regards,
>>>>>> Ashish
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.ra...@myntra.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> We have a use case where we are running some R/Python based data science
>>>>>> models, which execute over a single node. The execution time of the 
>>>>>> models
>>>>>> is constantly increasing and we are now planning to split the model
>>>>>> training by a partition key and distribute the workload over multiple
>>>>>> machines.
>>>>>>> 
>>>>>>> Does Airflow provide some simple way to split a task into multiple
>>>>>> tasks, all of which will work on a specific value of the key.
>>>>>>> 
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Ashish
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to