Thanks George. Our use case also periodic scheduling (daily), as well as task 
dependencies, so we chose Airflow for this use case. However, some of the tasks 
in a DAG have now become too big to execute over one node, we want to split 
them into multiple task to reduce execution time. Would you recommend firing 
parts of an Airflow DAG in another framework?

--
Regards,
Ashish



> On 09-Aug-2017, at 10:40 PM, George Leslie-Waksman 
> <geo...@cloverhealth.com.INVALID> wrote:
> 
> Airflow is best for situations where you want to run different tasks that
> depend on each other or process data that arrives over time. If your goal
> is to take a large dataset, split it up, and process chunks of it, there
> are probably other tools better suited to your purpose.
> 
> Off the top of my head, you might consider Dask:
> https://dask.pydata.org/en/latest/ or directly using Celery:
> http://www.celeryproject.org/
> 
> --George
> 
> On Wed, Aug 9, 2017 at 9:52 AM Ashish Rawat <ashish.ra...@myntra.com> wrote:
> 
>> Hi - Can anyone please provide some pointers for this use case over
>> Airflow?
>> 
>> --
>> Regards,
>> Ashish
>> 
>> 
>> 
>>> On 03-Aug-2017, at 9:13 PM, Ashish Rawat <ashish.ra...@myntra.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> We have a use case where we are running some R/Python based data science
>> models, which execute over a single node. The execution time of the models
>> is constantly increasing and we are now planning to split the model
>> training by a partition key and distribute the workload over multiple
>> machines.
>>> 
>>> Does Airflow provide some simple way to split a task into multiple
>> tasks, all of which will work on a specific value of the key.
>>> 
>>> --
>>> Regards,
>>> Ashish
>>> 
>>> 
>>> 
>> 
>> 

Reply via email to