@Maxime, I take your point. I think what I'd prefer is to have one stable,
first class citizen for a way to do distributed execution. I would also
like for that solution to not peg me to RabbigMQ or something wacky like
that--at least initially.

My concerns with Celery as it is currently:

1. It seems to have a lot of problems (
https://issues.apache.org/jira/issues/?jql=component%20%3D%20celery%20AND%20project%20%3D%20AIRFLOW).
Some of these are pretty serious (e.g. XCom issue). By comparison, I've
experienced zero local executor problems.
2. It doesn't have a good out of the box experience currently. If I could
run it as simply as `airflow scheduler` out of the box today, I probably
would.
3. We might find that we want increasing control of the executor
(especially if we move toward Docker), and having Celery might get in the
way of that.
4. Related to out-of-box experience: I don't want to have to run a queueing
system (at least initially). We currently run > 200 (small-ish) dags at 15m
intervals without a queue on a single LocalExecutor, and we don't
experience any problems with this.

If these are addressed, then I don't have a problem with Celery, per-se.

Cheers,
Chris



On Fri, May 13, 2016 at 10:47 AM, Chris Riccomini <[email protected]>
wrote:

> +1 to what Sid said.
>
> On Fri, May 13, 2016 at 10:33 AM, Siddharth Anand <[email protected]>
> wrote:
>
>> I mentioned this on the call yesterday as well. Going forward, all
>> meetings will be community-inclusive. We can follow what Apache Beam is
>> doing ( they have 10-15+ video windows at a time ) in this respect. We will
>> need a topic and agenda for each meetings, so that they are not
>> misconstrued as "office-hours" or free discussion meetings. We can hold
>> those as well, but the topic and agenda of each meeting will help
>> effectively manage larger meetings.
>>
>> We can use Gitter, Twitter, Confluence, and the dev list to announce the
>> meetings and share the agenda ahead of time.
>> Also, I feel there is no need for committers to be the sole initiators of
>> these meetings. We should make that clear to the community. However, if
>> some users/contributors do set up a meeting, it may be a good idea for some
>> committers to attend to help answer any questions, etc...
>> -s
>>
>>
>>     On Friday, May 13, 2016 4:55 PM, Jakob Homan <[email protected]>
>> wrote:
>>
>>
>>  Cool.  Was this a public meeting?  Will the next one be?
>>
>> On 13 May 2016 at 08:20, Chris Riccomini <[email protected]> wrote:
>> > Hey Bolke,
>> >
>> > Thanks for writing this up. I don't have a ton of feedback, as I'm not
>> > terribly familiar with the internals of the scheduler, but two notes:
>> >
>> > 1. A major +1 for the celery/local executor discussion. IMO, Celery is a
>> > net-negative on this project, and should be fully removed in favor of
>> the
>> > LocalExecutor. Splitting the scheduler from the executor in the
>> > LocalExecutor would basically give parity with Celery, AFAICT, and
>> sounds
>> > much easier to operate to me.
>> > 2. If we are moving towards Docker as a container for DAG execution in
>> the
>> > future, it's probably worth considering how these changes are going to
>> > affect the Docker implementation. If we do pursue (1), how does this
>> look
>> > in a Dockerized world? Is the executor going to still exist? Would the
>> > scheduler interact directly with Kubernetes/Mesos instead?
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <[email protected]>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> We did a video conference on the scheduler with a couple of the
>> committers
>> >> yesterday. The meeting was not there to finalize any roadmap but more
>> to
>> >> get a general understanding of each other's work. To keep it as
>> transparent
>> >> as possible hereby a summary:
>> >>
>> >> Who were attending:
>> >> Max, Paul, Arthur, Dan, Sid, Bolke
>> >>
>> >> The discussion centered around the scheduler sometimes diving into
>> >> connected topic such as pooling and executors. Paul discussed his work
>> on
>> >> making the scheduler more robust against faulty Dags and also to make
>> the
>> >> scheduler faster by not making it dependent on the slowest parsed Dag.
>> PR
>> >> work will be provided shortly to open it up to the community as the
>> aim is
>> >> to have this in by end of Q2 (no promises ;-)).
>> >>
>> >> Continuing the strain of thought of making the scheduler faster the
>> >> separation of executor and scheduler was also discussed. It was
>> remarked by
>> >> Max that doing this separation would essentially create the equivalent
>> of
>> >> the celery workers. Sid mentioned that celery seemed to be a culprit of
>> >> setup issues and people tend to use the local executor instead. The
>> >> discussion was parked as it needs to be discussed with a wider audience
>> >> (mailing list, community) and is not something that we thin is
>> required in
>> >> the near term (obviously PRs are welcome).
>> >>
>> >> Next, we discussed some of the scheduler issues that are marked in the
>> >> attached document (
>> >> https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg <
>> >> https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core
>> >> issues discussed were 1) TaskInstances can be created without a
>> DagRun, 2)
>> >> non-intuitive behavior with start_date and also depends_on_past and 3)
>> >> Lineage. It was agreed that the proposal add a previous field to the
>> DagRun
>> >> model and to make backfills (a.o) use DagRun make sense. More
>> discussion
>> >> was around the lineage part as that involves more in depth changes to
>> >> specifically TaskInstances. Still the consensus in the group was that
>> it is
>> >> necessary to make steps here and that they are long overdue.
>> >>
>> >> Lastly, we discussed to draft scheduler roadmap (see doc) to see if
>> there
>> >> were any misalignments. While there are some differences in details we
>> >> think the steps are quite compatible and the differences can be worked
>> out.
>> >>
>> >> So that was it, in case I missed anything correct me. In case of
>> questions
>> >> suggestions etc don’t hesitate and put them on the list.
>> >> Cheers
>> >> Bolke
>> >>
>>
>>
>>
>
>

Reply via email to