A suggestion for maintaining stability: as a "test mode" item, write database triggers for MySQL or Postgres that fail if a database transaction puts the database in a bogus state.
On Fri, May 13, 2016 at 5:37 PM, siddharth anand <[email protected]> wrote: > I'm not familiar enough with Celery -- refer to my comment about giving up > after a day of playing with it -- to discount it totally. I'd actually feel > better informed once I got it running and could publish a "take these > steps", which I'm surprised that no one has done. > > I'm all for simple, though I'm not sure "distributed executor" necessarily > falls in that camp. I'm open to any idea and PR, however. > > > -s > > On Fri, May 13, 2016 at 10:40 PM, Chris Riccomini <[email protected]> > wrote: > > > Hey Sid, > > > > I question the need for both local and celery executors (leaving > > sequential out of this). I think all we need is a scheduler + distributed > > executor. If you run only one each, then you have the LocalExecutor. The > > main thing that I care about is that this one thing is easy out of the > box, > > and well tested. Right now, Celery is neither of those things. > > > > If we just had: > > > > airflow webserver > > airflow scheduler > > airflow executor > > > > I'd be happy. If `airflow executor` could start as SQL alchemy/DB backed > > (just like LocalExecutor), but be upgraded (but not force you) to Redis > or > > RabbitMQ or SQS or whatever, great. > > > > I just want it easy and tested/stable. > > > > Cheers, > > Chris > > > > On Fri, May 13, 2016 at 11:57 AM, Siddharth Anand <[email protected]> > > wrote: > > > >> Bolke, Thanks for providing the document and for generally driving a > path > >> forward. > >> Regarding Local vs. Celery I think the project benefits greatly from > >> having multiple executors. Widespread adoption of Airflow involves > keeping > >> the barriers to adoption as low as possible. We ship Airflow with a > SQLite > >> DB and SequentialExecutor so that someone can simply install it on > his/her > >> laptop, run the examples, immediately get familiar with the awesome UI > >> features. Soon after, he/she will want to run it on a test machine and > >> share it with his/her colleagues/management. Since some UI features > don't > >> work with SQLAlchemy/SQLite, if the engineer were to run the Sequential > >> Engineer, his/her colleagues would like shoot the project down. Hence, > this > >> engineer (let's call him/her our champion) will need to install the > >> LocalExecutor and run it against a non SQLite DB. The champion may need > to > >> spin up a single machine in the cloud or request a machine from his/her > Ops > >> team for a POC. Once the champion demos Airflow and people love it, > people > >> will start using it. The LocalExecutor is the easiest to use, setup and > >> justify in terms of machine spend and complexity for a budding project > in a > >> company. It is also possible that scale never becomes and issue, then > the > >> level of setup was justified by the benefit to the company. BTW, at > Agari, > >> we didn't have great Terraform and Ansible coverage when I started using > >> Airflow - we do now. As a result, setting up new machines in the cloud > was > >> a little painful. > >> Now, once the company becomes dependent on airflow and if scale becomes > a > >> major issue, then it is wise to take the next architectural step, which > in > >> the case of the CeleryExecutor, means installing Redis and a bunch of > >> Celery components. By providing the lowest barrier to entry for each > level > >> of company/developer commitment, we are simply recognizing how companies > >> work. > >> My point in yesterday's conversation was that we need multiple > executors. > >> For any scheduler/core/executor changes made to the project, we need to > >> make sure it is tested on multiple executors. > >> Also, another point I would like to raise. The documentation around > >> getting Airflow running with Celery is very poor. I'd really like to > see a > >> tutorial with gotchas published. I tried setting it up for a day and > then > >> dropped it, preferring to run 2 schedulers (with LocalExecutors) for > both > >> increased scheduling redundancy and greater executor throughput. 10% of > our > >> GitHub issues reflect this lack of documentation and insight. It's great > >> that Airbnb is using it, but there is no clear path for others to > follow. > >> As a result, I suspect a small minority run with Celery. And then they > >> running into either pickling issues, celery queue management/insight > >> questions, dag sync problems (e.g. my start date is not honored), etc... > >> -s > >> > >> On Friday, May 13, 2016 3:20 PM, Chris Riccomini < > >> [email protected]> wrote: > >> > >> > >> Hey Bolke, > >> > >> Thanks for writing this up. I don't have a ton of feedback, as I'm not > >> terribly familiar with the internals of the scheduler, but two notes: > >> > >> 1. A major +1 for the celery/local executor discussion. IMO, Celery is a > >> net-negative on this project, and should be fully removed in favor of > the > >> LocalExecutor. Splitting the scheduler from the executor in the > >> LocalExecutor would basically give parity with Celery, AFAICT, and > sounds > >> much easier to operate to me. > >> 2. If we are moving towards Docker as a container for DAG execution in > the > >> future, it's probably worth considering how these changes are going to > >> affect the Docker implementation. If we do pursue (1), how does this > look > >> in a Dockerized world? Is the executor going to still exist? Would the > >> scheduler interact directly with Kubernetes/Mesos instead? > >> > >> Cheers, > >> Chris > >> > >> On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <[email protected]> > >> wrote: > >> > >> > Hi, > >> > > >> > We did a video conference on the scheduler with a couple of the > >> committers > >> > yesterday. The meeting was not there to finalize any roadmap but more > to > >> > get a general understanding of each other's work. To keep it as > >> transparent > >> > as possible hereby a summary: > >> > > >> > Who were attending: > >> > Max, Paul, Arthur, Dan, Sid, Bolke > >> > > >> > The discussion centered around the scheduler sometimes diving into > >> > connected topic such as pooling and executors. Paul discussed his work > >> on > >> > making the scheduler more robust against faulty Dags and also to make > >> the > >> > scheduler faster by not making it dependent on the slowest parsed Dag. > >> PR > >> > work will be provided shortly to open it up to the community as the > aim > >> is > >> > to have this in by end of Q2 (no promises ;-)). > >> > > >> > Continuing the strain of thought of making the scheduler faster the > >> > separation of executor and scheduler was also discussed. It was > >> remarked by > >> > Max that doing this separation would essentially create the equivalent > >> of > >> > the celery workers. Sid mentioned that celery seemed to be a culprit > of > >> > setup issues and people tend to use the local executor instead. The > >> > discussion was parked as it needs to be discussed with a wider > audience > >> > (mailing list, community) and is not something that we thin is > required > >> in > >> > the near term (obviously PRs are welcome). > >> > > >> > Next, we discussed some of the scheduler issues that are marked in the > >> > attached document ( > >> > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg < > >> > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core > >> > issues discussed were 1) TaskInstances can be created without a > DagRun, > >> 2) > >> > non-intuitive behavior with start_date and also depends_on_past and 3) > >> > Lineage. It was agreed that the proposal add a previous field to the > >> DagRun > >> > model and to make backfills (a.o) use DagRun make sense. More > discussion > >> > was around the lineage part as that involves more in depth changes to > >> > specifically TaskInstances. Still the consensus in the group was that > >> it is > >> > necessary to make steps here and that they are long overdue. > >> > > >> > Lastly, we discussed to draft scheduler roadmap (see doc) to see if > >> there > >> > were any misalignments. While there are some differences in details we > >> > think the steps are quite compatible and the differences can be worked > >> out. > >> > > >> > So that was it, in case I missed anything correct me. In case of > >> questions > >> > suggestions etc don’t hesitate and put them on the list. > >> > Cheers > >> > Bolke > >> > > >> > >> > >> > > > > > -- Lance Norskog [email protected] Redwood City, CA
