As far as I know I'm the only person using Dask with Airflow at the moment. I've been using Dask for a variety of other (non-Airflow) tasks and have found it to be a great tool. However, it's important to note that Celery is a much more mature project with finer control over how tasks are executed. In fact Dask's objectives are totally different (I think of it as "pure-Python Spark") but it happens to expose similar functionality to Celery through its Distributed subproject.
I added a DaskExecutor to Airflow in my last commit and am working on improving the unit tests now. I've been running the DaskExecutor in a test environment with good results, but between the fact that you have to run Airflow's bleeding-edge master branch to get it and that I'm the only person kicking its tires (at the moment), I would only recommend using it if you like to live very dangerously indeed. In the near future, I can see Dask being a recommended way to scale Airflow beyond a single machine due to the ease of setting it up -- but not yet. On Mon, Feb 13, 2017 at 11:04 AM Bolke de Bruin <[email protected]> wrote: Dask just landed in master. So no Celery is the most used option to scale-out. Always interested in what you are running into, but please be prepared to provide a lot of info on your setup. - Boke > On 13 Feb 2017, at 17:01, EKC (Erik Cederstrand) <[email protected]> wrote: > > Hello all, > > > I'm investigating why some of our DAGs are not being scheduled properly ( ran into https://issues.apache.org/jira/browse/AIRFLOW-342, among other things). Coupled with comments on this list, I'm getting the impression that Celery is a second-class citizen and core developers are mainly using Dask. Is this correct? > > > If Dask support is simply more mature and more likely to have issues responded to, I'll consider migrating our installation. > > > Thanks, > > Erik
