Thanks to both for correcting my understanding. I'll see what information I can collect on our issues and report back if I get anything coherent.
Kind regards, Erik ________________________________ From: Jeremiah Lowin <[email protected]> Sent: Monday, February 13, 2017 6:26:15 PM To: [email protected] Subject: Re: Celery or Dask? As far as I know I'm the only person using Dask with Airflow at the moment. I've been using Dask for a variety of other (non-Airflow) tasks and have found it to be a great tool. However, it's important to note that Celery is a much more mature project with finer control over how tasks are executed. In fact Dask's objectives are totally different (I think of it as "pure-Python Spark") but it happens to expose similar functionality to Celery through its Distributed subproject. I added a DaskExecutor to Airflow in my last commit and am working on improving the unit tests now. I've been running the DaskExecutor in a test environment with good results, but between the fact that you have to run Airflow's bleeding-edge master branch to get it and that I'm the only person kicking its tires (at the moment), I would only recommend using it if you like to live very dangerously indeed. In the near future, I can see Dask being a recommended way to scale Airflow beyond a single machine due to the ease of setting it up -- but not yet. On Mon, Feb 13, 2017 at 11:04 AM Bolke de Bruin <[email protected]> wrote: Dask just landed in master. So no Celery is the most used option to scale-out. Always interested in what you are running into, but please be prepared to provide a lot of info on your setup. - Boke > On 13 Feb 2017, at 17:01, EKC (Erik Cederstrand) <[email protected]> wrote: > > Hello all, > > > I'm investigating why some of our DAGs are not being scheduled properly ( ran into https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-342&data=01%7C01%7CEKC%40novozymes.com%7Cba906a466ee24463ab0908d4543580ac%7C43d5f49ee03a4d22a2285684196bb001%7C0&sdata=TYksYDtZ2QEG4ZV0oMi345yvQPBIPm449X0QaaKfct0%3D&reserved=0, among other things). Coupled with comments on this list, I'm getting the impression that Celery is a second-class citizen and core developers are mainly using Dask. Is this correct? > > > If Dask support is simply more mature and more likely to have issues responded to, I'll consider migrating our installation. > > > Thanks, > > Erik
