Re: Summary of committer meeting 2016-05-12

Maxime Beauchemin Fri, 13 May 2016 10:44:37 -0700

A few thoughts on moving away from Celery and around the Executor
interface. To me LocalExecutor means local as "in-process" and it's
implemented as a local multiprocess pool/queue, so making it remote or "out
of process" changes its definition or premise. Let's then refer to what
we're really talking about as "creating a RemoteExecutor that doesn't
depend on Celery".


Now about this RemoteExecutor idea. If you boil it down to its essence,
remote executor isn't that far from Celery in itself: each worker process
listen for messages, is parameterized to have N slots, manages and returns
its state, maybe listen to only certain types of message (let's call those
queues). Now how do we circulate messages around? We have a database, let's
use the database as a message queue maybe? Databases don't make for
scalable message queues, should we support a proper message queue? What
about Redis? RabbitMQ? Kafka? SQS? Let's write an interface for that.

Wait am I talking about RemoteExecutor or CeleryExecutor at this point?
Maybe RemoteExecutor is CeleryExecutor.

Side note: celery supports using SqlAlchemy (any database) as a message
queue. Maybe that it the default setup there. We don't point people to
using LocalExecutor, but CeleryExecutor with the DB as a backend.

Max

On Fri, May 13, 2016 at 10:05 AM, Bolke de Bruin <[email protected]> wrote:

>
> It was but it wasn't broadly communicated. We will repeat it, with an open
> invitation, every week or two weeks.
>
> Now to figure out how to share a video link that works continuously
> without me or someone else being there every time...
>
> B.
>
> Sent from my iPhone
>
> > On 13 mei 2016, at 18:55, Jakob Homan <[email protected]> wrote:
> >
> > Cool.  Was this a public meeting?  Will the next one be?
> >
> >> On 13 May 2016 at 08:20, Chris Riccomini <[email protected]> wrote:
> >> Hey Bolke,
> >>
> >> Thanks for writing this up. I don't have a ton of feedback, as I'm not
> >> terribly familiar with the internals of the scheduler, but two notes:
> >>
> >> 1. A major +1 for the celery/local executor discussion. IMO, Celery is a
> >> net-negative on this project, and should be fully removed in favor of
> the
> >> LocalExecutor. Splitting the scheduler from the executor in the
> >> LocalExecutor would basically give parity with Celery, AFAICT, and
> sounds
> >> much easier to operate to me.
> >> 2. If we are moving towards Docker as a container for DAG execution in
> the
> >> future, it's probably worth considering how these changes are going to
> >> affect the Docker implementation. If we do pursue (1), how does this
> look
> >> in a Dockerized world? Is the executor going to still exist? Would the
> >> scheduler interact directly with Kubernetes/Mesos instead?
> >>
> >> Cheers,
> >> Chris
> >>
> >>> On Fri, May 13, 2016 at 3:41 AM, Bolke de Bruin <[email protected]>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We did a video conference on the scheduler with a couple of the
> committers
> >>> yesterday. The meeting was not there to finalize any roadmap but more
> to
> >>> get a general understanding of each other's work. To keep it as
> transparent
> >>> as possible hereby a summary:
> >>>
> >>> Who were attending:
> >>> Max, Paul, Arthur, Dan, Sid, Bolke
> >>>
> >>> The discussion centered around the scheduler sometimes diving into
> >>> connected topic such as pooling and executors. Paul discussed his work
> on
> >>> making the scheduler more robust against faulty Dags and also to make
> the
> >>> scheduler faster by not making it dependent on the slowest parsed Dag.
> PR
> >>> work will be provided shortly to open it up to the community as the
> aim is
> >>> to have this in by end of Q2 (no promises ;-)).
> >>>
> >>> Continuing the strain of thought of making the scheduler faster the
> >>> separation of executor and scheduler was also discussed. It was
> remarked by
> >>> Max that doing this separation would essentially create the equivalent
> of
> >>> the celery workers. Sid mentioned that celery seemed to be a culprit of
> >>> setup issues and people tend to use the local executor instead. The
> >>> discussion was parked as it needs to be discussed with a wider audience
> >>> (mailing list, community) and is not something that we thin is
> required in
> >>> the near term (obviously PRs are welcome).
> >>>
> >>> Next, we discussed some of the scheduler issues that are marked in the
> >>> attached document (
> >>> https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg <
> >>> https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core
> >>> issues discussed were 1) TaskInstances can be created without a
> DagRun, 2)
> >>> non-intuitive behavior with start_date and also depends_on_past and 3)
> >>> Lineage. It was agreed that the proposal add a previous field to the
> DagRun
> >>> model and to make backfills (a.o) use DagRun make sense. More
> discussion
> >>> was around the lineage part as that involves more in depth changes to
> >>> specifically TaskInstances. Still the consensus in the group was that
> it is
> >>> necessary to make steps here and that they are long overdue.
> >>>
> >>> Lastly, we discussed to draft scheduler roadmap (see doc) to see if
> there
> >>> were any misalignments. While there are some differences in details we
> >>> think the steps are quite compatible and the differences can be worked
> out.
> >>>
> >>> So that was it, in case I missed anything correct me. In case of
> questions
> >>> suggestions etc don’t hesitate and put them on the list.
> >>> Cheers
> >>> Bolke
> >>>
>

Re: Summary of committer meeting 2016-05-12

Reply via email to