Hey Sid, I question the need for both local and celery executors (leaving sequential out of this). I think all we need is a scheduler + distributed executor. If you run only one each, then you have the LocalExecutor. The main thing that I care about is that this one thing is easy out of the box, and well tested. Right now, Celery is neither of those things.
If we just had: airflow webserver airflow scheduler airflow executor I'd be happy. If `airflow executor` could start as SQL alchemy/DB backed (just like LocalExecutor), but be upgraded (but not force you) to Redis or RabbitMQ or SQS or whatever, great. I just want it easy and tested/stable. Cheers, Chris On Fri, May 13, 2016 at 11:57 AM, Siddharth Anand <[email protected]> wrote: > Bolke, Thanks for providing the document and for generally driving a path > forward. > Regarding Local vs. Celery I think the project benefits greatly from > having multiple executors. Widespread adoption of Airflow involves keeping > the barriers to adoption as low as possible. We ship Airflow with a SQLite > DB and SequentialExecutor so that someone can simply install it on his/her > laptop, run the examples, immediately get familiar with the awesome UI > features. Soon after, he/she will want to run it on a test machine and > share it with his/her colleagues/management. Since some UI features don't > work with SQLAlchemy/SQLite, if the engineer were to run the Sequential > Engineer, his/her colleagues would like shoot the project down. Hence, this > engineer (let's call him/her our champion) will need to install the > LocalExecutor and run it against a non SQLite DB. The champion may need to > spin up a single machine in the cloud or request a machine from his/her Ops > team for a POC. Once the champion demos Airflow and people love it, people > will start using it. The LocalExecutor is the easiest to use, setup and > justify in terms of machine spend and complexity for a budding project in a > company. It is also possible that scale never becomes and issue, then the > level of setup was justified by the benefit to the company. BTW, at Agari, > we didn't have great Terraform and Ansible coverage when I started using > Airflow - we do now. As a result, setting up new machines in the cloud was > a little painful. > Now, once the company becomes dependent on airflow and if scale becomes a > major issue, then it is wise to take the next architectural step, which in > the case of the CeleryExecutor, means installing Redis and a bunch of > Celery components. By providing the lowest barrier to entry for each level > of company/developer commitment, we are simply recognizing how companies > work. > My point in yesterday's conversation was that we need multiple executors. > For any scheduler/core/executor changes made to the project, we need to > make sure it is tested on multiple executors. > Also, another point I would like to raise. The documentation around > getting Airflow running with Celery is very poor. I'd really like to see a > tutorial with gotchas published. I tried setting it up for a day and then > dropped it, preferring to run 2 schedulers (with LocalExecutors) for both > increased scheduling redundancy and greater executor throughput. 10% of our > GitHub issues reflect this lack of documentation and insight. It's great > that Airbnb is using it, but there is no clear path for others to follow. > As a result, I suspect a small minority run with Celery. And then they > running into either pickling issues, celery queue management/insight > questions, dag sync problems (e.g. my start date is not honored), etc... > -s > > On Friday, May 13, 2016 3:20 PM, Chris Riccomini < > [email protected]> wrote: > > > Hey Bolke, > > Thanks for writing this up. I don't have a ton of feedback, as I'm not > terribly familiar with the internals of the scheduler, but two notes: > > 1. A major +1 for the celery/local executor discussion. IMO, Celery is a > net-negative on this project, and should be fully removed in favor of the > LocalExecutor. Splitting the scheduler from the executor in the > LocalExecutor would basically give parity with Celery, AFAICT, and sounds > much easier to operate to me. > 2. If we are moving towards Docker as a container for DAG execution in the > future, it's probably worth considering how these changes are going to > affect the Docker implementation. If we do pursue (1), how does this look > in a Dockerized world? Is the executor going to still exist? Would the > scheduler interact directly with Kubernetes/Mesos instead? > > Cheers, > Chris > > On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <[email protected]> wrote: > > > Hi, > > > > We did a video conference on the scheduler with a couple of the > committers > > yesterday. The meeting was not there to finalize any roadmap but more to > > get a general understanding of each other's work. To keep it as > transparent > > as possible hereby a summary: > > > > Who were attending: > > Max, Paul, Arthur, Dan, Sid, Bolke > > > > The discussion centered around the scheduler sometimes diving into > > connected topic such as pooling and executors. Paul discussed his work on > > making the scheduler more robust against faulty Dags and also to make the > > scheduler faster by not making it dependent on the slowest parsed Dag. PR > > work will be provided shortly to open it up to the community as the aim > is > > to have this in by end of Q2 (no promises ;-)). > > > > Continuing the strain of thought of making the scheduler faster the > > separation of executor and scheduler was also discussed. It was remarked > by > > Max that doing this separation would essentially create the equivalent of > > the celery workers. Sid mentioned that celery seemed to be a culprit of > > setup issues and people tend to use the local executor instead. The > > discussion was parked as it needs to be discussed with a wider audience > > (mailing list, community) and is not something that we thin is required > in > > the near term (obviously PRs are welcome). > > > > Next, we discussed some of the scheduler issues that are marked in the > > attached document ( > > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg < > > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core > > issues discussed were 1) TaskInstances can be created without a DagRun, > 2) > > non-intuitive behavior with start_date and also depends_on_past and 3) > > Lineage. It was agreed that the proposal add a previous field to the > DagRun > > model and to make backfills (a.o) use DagRun make sense. More discussion > > was around the lineage part as that involves more in depth changes to > > specifically TaskInstances. Still the consensus in the group was that it > is > > necessary to make steps here and that they are long overdue. > > > > Lastly, we discussed to draft scheduler roadmap (see doc) to see if there > > were any misalignments. While there are some differences in details we > > think the steps are quite compatible and the differences can be worked > out. > > > > So that was it, in case I missed anything correct me. In case of > questions > > suggestions etc don’t hesitate and put them on the list. > > Cheers > > Bolke > > > > >
