Have you looked into subdags?

Brian


> On Aug 9, 2017, at 10:16 AM, Ashish Rawat <ashish.ra...@myntra.com> wrote:
> 
> Thanks George. Our use case also periodic scheduling (daily), as well as task 
> dependencies, so we chose Airflow for this use case. However, some of the 
> tasks in a DAG have now become too big to execute over one node, we want to 
> split them into multiple task to reduce execution time. Would you recommend 
> firing parts of an Airflow DAG in another framework?
> 
> --
> Regards,
> Ashish
> 
> 
> 
>> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman 
>> <geo...@cloverhealth.com.INVALID> wrote:
>> 
>> Airflow is best for situations where you want to run different tasks that
>> depend on each other or process data that arrives over time. If your goal
>> is to take a large dataset, split it up, and process chunks of it, there
>> are probably other tools better suited to your purpose.
>> 
>> Off the top of my head, you might consider Dask:
>> https://dask.pydata.org/en/latest/ or directly using Celery:
>> http://www.celeryproject.org/
>> 
>> --George
>> 
>> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.ra...@myntra.com> wrote:
>> 
>>> Hi - Can anyone please provide some pointers for this use case over
>>> Airflow?
>>> 
>>> --
>>> Regards,
>>> Ashish
>>> 
>>> 
>>> 
>>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.ra...@myntra.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> We have a use case where we are running some R/Python based data science
>>> models, which execute over a single node. The execution time of the models
>>> is constantly increasing and we are now planning to split the model
>>> training by a partition key and distribute the workload over multiple
>>> machines.
>>>> 
>>>> Does Airflow provide some simple way to split a task into multiple
>>> tasks, all of which will work on a specific value of the key.
>>>> 
>>>> --
>>>> Regards,
>>>> Ashish
>>>> 
>>>> 
>>>> 
>>> 
>>> 
> 

Reply via email to