Yes, I believe they are used for splitting a bigger DAG into smaller DAGs, for 
clarity and reusability. In our use case, we need to split/replicate a specific 
task into multiple tasks, based on the different values of a key, essentially 
data partitioning and processing.

--
Regards,
Ashish



> On 09-Aug-2017, at 10:49 PM, Van Klaveren, Brian N. <b...@slac.stanford.edu> 
> wrote:
> 
> Have you looked into subdags?
> 
> Brian
> 
> 
>> On Aug 9, 2017, at 10:16 AM, Ashish Rawat <ashish.ra...@myntra.com> wrote:
>> 
>> Thanks George. Our use case also periodic scheduling (daily), as well as 
>> task dependencies, so we chose Airflow for this use case. However, some of 
>> the tasks in a DAG have now become too big to execute over one node, we want 
>> to split them into multiple task to reduce execution time. Would you 
>> recommend firing parts of an Airflow DAG in another framework?
>> 
>> --
>> Regards,
>> Ashish
>> 
>> 
>> 
>>> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman 
>>> <geo...@cloverhealth.com.INVALID> wrote:
>>> 
>>> Airflow is best for situations where you want to run different tasks that
>>> depend on each other or process data that arrives over time. If your goal
>>> is to take a large dataset, split it up, and process chunks of it, there
>>> are probably other tools better suited to your purpose.
>>> 
>>> Off the top of my head, you might consider Dask:
>>> https://dask.pydata.org/en/latest/ or directly using Celery:
>>> http://www.celeryproject.org/
>>> 
>>> --George
>>> 
>>> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.ra...@myntra.com> wrote:
>>> 
>>>> Hi - Can anyone please provide some pointers for this use case over
>>>> Airflow?
>>>> 
>>>> --
>>>> Regards,
>>>> Ashish
>>>> 
>>>> 
>>>> 
>>>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.ra...@myntra.com>
>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> We have a use case where we are running some R/Python based data science
>>>> models, which execute over a single node. The execution time of the models
>>>> is constantly increasing and we are now planning to split the model
>>>> training by a partition key and distribute the workload over multiple
>>>> machines.
>>>>> 
>>>>> Does Airflow provide some simple way to split a task into multiple
>>>> tasks, all of which will work on a specific value of the key.
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> Ashish
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
> 

Reply via email to