Re: Summary of committer meeting 2016-05-12

Siddharth Anand Fri, 13 May 2016 11:58:34 -0700

Bolke, Thanks for providing the document and for generally driving a path 
forward.
Regarding Local vs. Celery I think the project benefits greatly from having 
multiple executors. Widespread adoption of Airflow involves keeping the 
barriers to adoption as low as possible. We ship Airflow with a SQLite DB and 
SequentialExecutor so that someone can simply install it on his/her laptop, run 
the examples, immediately get familiar with the awesome UI features. Soon 
after, he/she will want to run it on a test machine and share it with his/her 
colleagues/management. Since some UI features don't work with 
SQLAlchemy/SQLite, if the engineer were to run the Sequential Engineer, his/her 
colleagues would like shoot the project down. Hence, this engineer (let's call 
him/her our champion) will need to install the LocalExecutor and run it against 
a non SQLite DB. The champion may need to spin up a single machine in the cloud 
or request a machine from his/her Ops team for a POC. Once the champion demos 
Airflow and people love it, people will start using it. The LocalExecutor is 
the easiest to use, setup and justify in terms of machine spend and complexity 
for a budding project in a company. It is also possible that scale never 
becomes and issue, then the level of setup was justified by the benefit to the 
company. BTW, at Agari, we didn't have great Terraform and Ansible coverage 
when I started using Airflow - we do now. As a result, setting up new machines 
in the cloud was a little painful. 
Now, once the company becomes dependent on airflow and if scale becomes a major 
issue, then it is wise to take the next architectural step, which in the case 
of the CeleryExecutor, means installing Redis and a bunch of Celery components. 
By providing the lowest barrier to entry for each level of company/developer 
commitment, we are simply recognizing how companies work. 
My point in yesterday's conversation was that we need multiple executors. For 
any scheduler/core/executor changes made to the project, we need to make sure 
it is tested on multiple executors. 
Also, another point I would like to raise. The documentation around getting 
Airflow running with Celery is very poor. I'd really like to see a tutorial 
with gotchas published. I tried setting it up for a day and then dropped it, 
preferring to run 2 schedulers (with LocalExecutors) for both increased 
scheduling redundancy and greater executor throughput. 10% of our GitHub issues 
reflect this lack of documentation and insight. It's great that Airbnb is using 
it, but there is no clear path for others to follow. As a result, I suspect a 
small minority run with Celery. And then they running into either pickling 
issues, celery queue management/insight questions, dag sync problems (e.g. my 
start date is not honored), etc... 
-s


    On Friday, May 13, 2016 3:20 PM, Chris Riccomini <[email protected]> 
wrote:
 

 Hey Bolke,

Thanks for writing this up. I don't have a ton of feedback, as I'm not
terribly familiar with the internals of the scheduler, but two notes:

1. A major +1 for the celery/local executor discussion. IMO, Celery is a
net-negative on this project, and should be fully removed in favor of the
LocalExecutor. Splitting the scheduler from the executor in the
LocalExecutor would basically give parity with Celery, AFAICT, and sounds
much easier to operate to me.
2. If we are moving towards Docker as a container for DAG execution in the
future, it's probably worth considering how these changes are going to
affect the Docker implementation. If we do pursue (1), how does this look
in a Dockerized world? Is the executor going to still exist? Would the
scheduler interact directly with Kubernetes/Mesos instead?

Cheers,
Chris

On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <[email protected]> wrote:

> Hi,
>
> We did a video conference on the scheduler with a couple of the committers
> yesterday. The meeting was not there to finalize any roadmap but more to
> get a general understanding of each other's work. To keep it as transparent
> as possible hereby a summary:
>
> Who were attending:
> Max, Paul, Arthur, Dan, Sid, Bolke
>
> The discussion centered around the scheduler sometimes diving into
> connected topic such as pooling and executors. Paul discussed his work on
> making the scheduler more robust against faulty Dags and also to make the
> scheduler faster by not making it dependent on the slowest parsed Dag. PR
> work will be provided shortly to open it up to the community as the aim is
> to have this in by end of Q2 (no promises ;-)).
>
> Continuing the strain of thought of making the scheduler faster the
> separation of executor and scheduler was also discussed. It was remarked by
> Max that doing this separation would essentially create the equivalent of
> the celery workers. Sid mentioned that celery seemed to be a culprit of
> setup issues and people tend to use the local executor instead. The
> discussion was parked as it needs to be discussed with a wider audience
> (mailing list, community) and is not something that we thin is required in
> the near term (obviously PRs are welcome).
>
> Next, we discussed some of the scheduler issues that are marked in the
> attached document (
> https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg <
> https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core
> issues discussed were 1) TaskInstances can be created without a DagRun, 2)
> non-intuitive behavior with start_date and also depends_on_past and 3)
> Lineage. It was agreed that the proposal add a previous field to the DagRun
> model and to make backfills (a.o) use DagRun make sense. More discussion
> was around the lineage part as that involves more in depth changes to
> specifically TaskInstances. Still the consensus in the group was that it is
> necessary to make steps here and that they are long overdue.
>
> Lastly, we discussed to draft scheduler roadmap (see doc) to see if there
> were any misalignments. While there are some differences in details we
> think the steps are quite compatible and the differences can be worked out.
>
> So that was it, in case I missed anything correct me. In case of questions
> suggestions etc don’t hesitate and put them on the list.
> Cheers
> Bolke
>

Re: Summary of committer meeting 2016-05-12

Reply via email to