Re: Summary of committer meeting 2016-05-12

Lance Norskog Fri, 13 May 2016 21:06:07 -0700

A suggestion for maintaining stability: as a "test mode" item, write
database triggers for MySQL or Postgres that fail if a database transaction
puts the database in a bogus state.


On Fri, May 13, 2016 at 5:37 PM, siddharth anand <[email protected]> wrote:

> I'm not familiar enough with Celery -- refer to my comment about giving up
> after a day of playing with it -- to discount it totally. I'd actually feel
> better informed once I got it running and could publish a "take these
> steps", which I'm surprised that no one has done.
>
> I'm all for simple, though I'm not sure "distributed executor" necessarily
> falls in that camp. I'm open to any idea and PR, however.
>
>
> -s
>
> On Fri, May 13, 2016 at 10:40 PM, Chris Riccomini <[email protected]>
> wrote:
>
> > Hey Sid,
> >
> > I question the need for both local and celery executors (leaving
> > sequential out of this). I think all we need is a scheduler + distributed
> > executor. If you run only one each, then you have the LocalExecutor. The
> > main thing that I care about is that this one thing is easy out of the
> box,
> > and well tested. Right now, Celery is neither of those things.
> >
> > If we just had:
> >
> > airflow webserver
> > airflow scheduler
> > airflow executor
> >
> > I'd be happy. If `airflow executor` could start as SQL alchemy/DB backed
> > (just like LocalExecutor), but be upgraded (but not force you) to Redis
> or
> > RabbitMQ or SQS or whatever, great.
> >
> > I just want it easy and tested/stable.
> >
> > Cheers,
> > Chris
> >
> > On Fri, May 13, 2016 at 11:57 AM, Siddharth Anand <[email protected]>
> > wrote:
> >
> >> Bolke, Thanks for providing the document and for generally driving a
> path
> >> forward.
> >> Regarding Local vs. Celery I think the project benefits greatly from
> >> having multiple executors. Widespread adoption of Airflow involves
> keeping
> >> the barriers to adoption as low as possible. We ship Airflow with a
> SQLite
> >> DB and SequentialExecutor so that someone can simply install it on
> his/her
> >> laptop, run the examples, immediately get familiar with the awesome UI
> >> features. Soon after, he/she will want to run it on a test machine and
> >> share it with his/her colleagues/management. Since some UI features
> don't
> >> work with SQLAlchemy/SQLite, if the engineer were to run the Sequential
> >> Engineer, his/her colleagues would like shoot the project down. Hence,
> this
> >> engineer (let's call him/her our champion) will need to install the
> >> LocalExecutor and run it against a non SQLite DB. The champion may need
> to
> >> spin up a single machine in the cloud or request a machine from his/her
> Ops
> >> team for a POC. Once the champion demos Airflow and people love it,
> people
> >> will start using it. The LocalExecutor is the easiest to use, setup and
> >> justify in terms of machine spend and complexity for a budding project
> in a
> >> company. It is also possible that scale never becomes and issue, then
> the
> >> level of setup was justified by the benefit to the company. BTW, at
> Agari,
> >> we didn't have great Terraform and Ansible coverage when I started using
> >> Airflow - we do now. As a result, setting up new machines in the cloud
> was
> >> a little painful.
> >> Now, once the company becomes dependent on airflow and if scale becomes
> a
> >> major issue, then it is wise to take the next architectural step, which
> in
> >> the case of the CeleryExecutor, means installing Redis and a bunch of
> >> Celery components. By providing the lowest barrier to entry for each
> level
> >> of company/developer commitment, we are simply recognizing how companies
> >> work.
> >> My point in yesterday's conversation was that we need multiple
> executors.
> >> For any scheduler/core/executor changes made to the project, we need to
> >> make sure it is tested on multiple executors.
> >> Also, another point I would like to raise. The documentation around
> >> getting Airflow running with Celery is very poor. I'd really like to
> see a
> >> tutorial with gotchas published. I tried setting it up for a day and
> then
> >> dropped it, preferring to run 2 schedulers (with LocalExecutors) for
> both
> >> increased scheduling redundancy and greater executor throughput. 10% of
> our
> >> GitHub issues reflect this lack of documentation and insight. It's great
> >> that Airbnb is using it, but there is no clear path for others to
> follow.
> >> As a result, I suspect a small minority run with Celery. And then they
> >> running into either pickling issues, celery queue management/insight
> >> questions, dag sync problems (e.g. my start date is not honored), etc...
> >> -s
> >>
> >>     On Friday, May 13, 2016 3:20 PM, Chris Riccomini <
> >> [email protected]> wrote:
> >>
> >>
> >>  Hey Bolke,
> >>
> >> Thanks for writing this up. I don't have a ton of feedback, as I'm not
> >> terribly familiar with the internals of the scheduler, but two notes:
> >>
> >> 1. A major +1 for the celery/local executor discussion. IMO, Celery is a
> >> net-negative on this project, and should be fully removed in favor of
> the
> >> LocalExecutor. Splitting the scheduler from the executor in the
> >> LocalExecutor would basically give parity with Celery, AFAICT, and
> sounds
> >> much easier to operate to me.
> >> 2. If we are moving towards Docker as a container for DAG execution in
> the
> >> future, it's probably worth considering how these changes are going to
> >> affect the Docker implementation. If we do pursue (1), how does this
> look
> >> in a Dockerized world? Is the executor going to still exist? Would the
> >> scheduler interact directly with Kubernetes/Mesos instead?
> >>
> >> Cheers,
> >> Chris
> >>
> >> On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <[email protected]>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > We did a video conference on the scheduler with a couple of the
> >> committers
> >> > yesterday. The meeting was not there to finalize any roadmap but more
> to
> >> > get a general understanding of each other's work. To keep it as
> >> transparent
> >> > as possible hereby a summary:
> >> >
> >> > Who were attending:
> >> > Max, Paul, Arthur, Dan, Sid, Bolke
> >> >
> >> > The discussion centered around the scheduler sometimes diving into
> >> > connected topic such as pooling and executors. Paul discussed his work
> >> on
> >> > making the scheduler more robust against faulty Dags and also to make
> >> the
> >> > scheduler faster by not making it dependent on the slowest parsed Dag.
> >> PR
> >> > work will be provided shortly to open it up to the community as the
> aim
> >> is
> >> > to have this in by end of Q2 (no promises ;-)).
> >> >
> >> > Continuing the strain of thought of making the scheduler faster the
> >> > separation of executor and scheduler was also discussed. It was
> >> remarked by
> >> > Max that doing this separation would essentially create the equivalent
> >> of
> >> > the celery workers. Sid mentioned that celery seemed to be a culprit
> of
> >> > setup issues and people tend to use the local executor instead. The
> >> > discussion was parked as it needs to be discussed with a wider
> audience
> >> > (mailing list, community) and is not something that we thin is
> required
> >> in
> >> > the near term (obviously PRs are welcome).
> >> >
> >> > Next, we discussed some of the scheduler issues that are marked in the
> >> > attached document (
> >> > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg <
> >> > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core
> >> > issues discussed were 1) TaskInstances can be created without a
> DagRun,
> >> 2)
> >> > non-intuitive behavior with start_date and also depends_on_past and 3)
> >> > Lineage. It was agreed that the proposal add a previous field to the
> >> DagRun
> >> > model and to make backfills (a.o) use DagRun make sense. More
> discussion
> >> > was around the lineage part as that involves more in depth changes to
> >> > specifically TaskInstances. Still the consensus in the group was that
> >> it is
> >> > necessary to make steps here and that they are long overdue.
> >> >
> >> > Lastly, we discussed to draft scheduler roadmap (see doc) to see if
> >> there
> >> > were any misalignments. While there are some differences in details we
> >> > think the steps are quite compatible and the differences can be worked
> >> out.
> >> >
> >> > So that was it, in case I missed anything correct me. In case of
> >> questions
> >> > suggestions etc don’t hesitate and put them on the list.
> >> > Cheers
> >> > Bolke
> >> >
> >>
> >>
> >>
> >
> >
>



-- 
Lance Norskog
[email protected]
Redwood City, CA

Re: Summary of committer meeting 2016-05-12

Reply via email to