Hi Ashish,

Partitioned tasks might be able to be modeled as triggered, n-many 
parameterized dags/subdags (where the parameter is the partition key). I've 
used this pattern in the past a lot in other systems, but not specifically with 
airflow, so I'm not sure how you'd necessarily implement it with Airflow, but 
hoping this maybe gives you some ideas.

Brian


> On Aug 9, 2017, at 10:23 AM, Ashish Rawat <ashish.ra...@myntra.com> wrote:
> 
> Yes, I believe they are used for splitting a bigger DAG into smaller DAGs, 
> for clarity and reusability. In our use case, we need to split/replicate a 
> specific task into multiple tasks, based on the different values of a key, 
> essentially data partitioning and processing.
> 
> --
> Regards,
> Ashish
> 
> 
> 
>> On 09-Aug-2017, at 10:49 PM, Van Klaveren, Brian N. <b...@slac.stanford.edu> 
>> wrote:
>> 
>> Have you looked into subdags?
>> 
>> Brian
>> 
>> 
>>> On Aug 9, 2017, at 10:16 AM, Ashish Rawat <ashish.ra...@myntra.com> wrote:
>>> 
>>> Thanks George. Our use case also periodic scheduling (daily), as well as 
>>> task dependencies, so we chose Airflow for this use case. However, some of 
>>> the tasks in a DAG have now become too big to execute over one node, we 
>>> want to split them into multiple task to reduce execution time. Would you 
>>> recommend firing parts of an Airflow DAG in another framework?
>>> 
>>> --
>>> Regards,
>>> Ashish
>>> 
>>> 
>>> 
>>>> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman 
>>>> <geo...@cloverhealth.com.INVALID> wrote:
>>>> 
>>>> Airflow is best for situations where you want to run different tasks that
>>>> depend on each other or process data that arrives over time. If your goal
>>>> is to take a large dataset, split it up, and process chunks of it, there
>>>> are probably other tools better suited to your purpose.
>>>> 
>>>> Off the top of my head, you might consider Dask:
>>>> https://dask.pydata.org/en/latest/ or directly using Celery:
>>>> http://www.celeryproject.org/
>>>> 
>>>> --George
>>>> 
>>>> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.ra...@myntra.com> 
>>>> wrote:
>>>> 
>>>>> Hi - Can anyone please provide some pointers for this use case over
>>>>> Airflow?
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> Ashish
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.ra...@myntra.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> We have a use case where we are running some R/Python based data science
>>>>> models, which execute over a single node. The execution time of the models
>>>>> is constantly increasing and we are now planning to split the model
>>>>> training by a partition key and distribute the workload over multiple
>>>>> machines.
>>>>>> 
>>>>>> Does Airflow provide some simple way to split a task into multiple
>>>>> tasks, all of which will work on a specific value of the key.
>>>>>> 
>>>>>> --
>>>>>> Regards,
>>>>>> Ashish
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
> 

Reply via email to