Re: Summary of committer meeting 2016-05-12

Chris Riccomini Fri, 13 May 2016 15:41:57 -0700

Hey Sid,

I question the need for both local and celery executors (leaving sequential
out of this). I think all we need is a scheduler + distributed executor. If
you run only one each, then you have the LocalExecutor. The main thing that
I care about is that this one thing is easy out of the box, and well
tested. Right now, Celery is neither of those things.


If we just had:

airflow webserver
airflow scheduler
airflow executor

I'd be happy. If `airflow executor` could start as SQL alchemy/DB backed
(just like LocalExecutor), but be upgraded (but not force you) to Redis or
RabbitMQ or SQS or whatever, great.

I just want it easy and tested/stable.

Cheers,
Chris

On Fri, May 13, 2016 at 11:57 AM, Siddharth Anand <[email protected]> wrote:

> Bolke, Thanks for providing the document and for generally driving a path
> forward.
> Regarding Local vs. Celery I think the project benefits greatly from
> having multiple executors. Widespread adoption of Airflow involves keeping
> the barriers to adoption as low as possible. We ship Airflow with a SQLite
> DB and SequentialExecutor so that someone can simply install it on his/her
> laptop, run the examples, immediately get familiar with the awesome UI
> features. Soon after, he/she will want to run it on a test machine and
> share it with his/her colleagues/management. Since some UI features don't
> work with SQLAlchemy/SQLite, if the engineer were to run the Sequential
> Engineer, his/her colleagues would like shoot the project down. Hence, this
> engineer (let's call him/her our champion) will need to install the
> LocalExecutor and run it against a non SQLite DB. The champion may need to
> spin up a single machine in the cloud or request a machine from his/her Ops
> team for a POC. Once the champion demos Airflow and people love it, people
> will start using it. The LocalExecutor is the easiest to use, setup and
> justify in terms of machine spend and complexity for a budding project in a
> company. It is also possible that scale never becomes and issue, then the
> level of setup was justified by the benefit to the company. BTW, at Agari,
> we didn't have great Terraform and Ansible coverage when I started using
> Airflow - we do now. As a result, setting up new machines in the cloud was
> a little painful.
> Now, once the company becomes dependent on airflow and if scale becomes a
> major issue, then it is wise to take the next architectural step, which in
> the case of the CeleryExecutor, means installing Redis and a bunch of
> Celery components. By providing the lowest barrier to entry for each level
> of company/developer commitment, we are simply recognizing how companies
> work.
> My point in yesterday's conversation was that we need multiple executors.
> For any scheduler/core/executor changes made to the project, we need to
> make sure it is tested on multiple executors.
> Also, another point I would like to raise. The documentation around
> getting Airflow running with Celery is very poor. I'd really like to see a
> tutorial with gotchas published. I tried setting it up for a day and then
> dropped it, preferring to run 2 schedulers (with LocalExecutors) for both
> increased scheduling redundancy and greater executor throughput. 10% of our
> GitHub issues reflect this lack of documentation and insight. It's great
> that Airbnb is using it, but there is no clear path for others to follow.
> As a result, I suspect a small minority run with Celery. And then they
> running into either pickling issues, celery queue management/insight
> questions, dag sync problems (e.g. my start date is not honored), etc...
> -s
>
>     On Friday, May 13, 2016 3:20 PM, Chris Riccomini <
> [email protected]> wrote:
>
>
>  Hey Bolke,
>
> Thanks for writing this up. I don't have a ton of feedback, as I'm not
> terribly familiar with the internals of the scheduler, but two notes:
>
> 1. A major +1 for the celery/local executor discussion. IMO, Celery is a
> net-negative on this project, and should be fully removed in favor of the
> LocalExecutor. Splitting the scheduler from the executor in the
> LocalExecutor would basically give parity with Celery, AFAICT, and sounds
> much easier to operate to me.
> 2. If we are moving towards Docker as a container for DAG execution in the
> future, it's probably worth considering how these changes are going to
> affect the Docker implementation. If we do pursue (1), how does this look
> in a Dockerized world? Is the executor going to still exist? Would the
> scheduler interact directly with Kubernetes/Mesos instead?
>
> Cheers,
> Chris
>
> On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <[email protected]> wrote:
>
> > Hi,
> >
> > We did a video conference on the scheduler with a couple of the
> committers
> > yesterday. The meeting was not there to finalize any roadmap but more to
> > get a general understanding of each other's work. To keep it as
> transparent
> > as possible hereby a summary:
> >
> > Who were attending:
> > Max, Paul, Arthur, Dan, Sid, Bolke
> >
> > The discussion centered around the scheduler sometimes diving into
> > connected topic such as pooling and executors. Paul discussed his work on
> > making the scheduler more robust against faulty Dags and also to make the
> > scheduler faster by not making it dependent on the slowest parsed Dag. PR
> > work will be provided shortly to open it up to the community as the aim
> is
> > to have this in by end of Q2 (no promises ;-)).
> >
> > Continuing the strain of thought of making the scheduler faster the
> > separation of executor and scheduler was also discussed. It was remarked
> by
> > Max that doing this separation would essentially create the equivalent of
> > the celery workers. Sid mentioned that celery seemed to be a culprit of
> > setup issues and people tend to use the local executor instead. The
> > discussion was parked as it needs to be discussed with a wider audience
> > (mailing list, community) and is not something that we thin is required
> in
> > the near term (obviously PRs are welcome).
> >
> > Next, we discussed some of the scheduler issues that are marked in the
> > attached document (
> > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg <
> > https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core
> > issues discussed were 1) TaskInstances can be created without a DagRun,
> 2)
> > non-intuitive behavior with start_date and also depends_on_past and 3)
> > Lineage. It was agreed that the proposal add a previous field to the
> DagRun
> > model and to make backfills (a.o) use DagRun make sense. More discussion
> > was around the lineage part as that involves more in depth changes to
> > specifically TaskInstances. Still the consensus in the group was that it
> is
> > necessary to make steps here and that they are long overdue.
> >
> > Lastly, we discussed to draft scheduler roadmap (see doc) to see if there
> > were any misalignments. While there are some differences in details we
> > think the steps are quite compatible and the differences can be worked
> out.
> >
> > So that was it, in case I missed anything correct me. In case of
> questions
> > suggestions etc don’t hesitate and put them on the list.
> > Cheers
> > Bolke
> >
>
>
>

Re: Summary of committer meeting 2016-05-12

Reply via email to