Hi,

We did a video conference on the scheduler with a couple of the committers 
yesterday. The meeting was not there to finalize any roadmap but more to get a 
general understanding of each other's work. To keep it as transparent as 
possible hereby a summary:

Who were attending:
Max, Paul, Arthur, Dan, Sid, Bolke

The discussion centered around the scheduler sometimes diving into connected 
topic such as pooling and executors. Paul discussed his work on making the 
scheduler more robust against faulty Dags and also to make the scheduler faster 
by not making it dependent on the slowest parsed Dag. PR work will be provided 
shortly to open it up to the community as the aim is to have this in by end of 
Q2 (no promises ;-)).

Continuing the strain of thought of making the scheduler faster the separation 
of executor and scheduler was also discussed. It was remarked by Max that doing 
this separation would essentially create the equivalent of the celery workers. 
Sid mentioned that celery seemed to be a culprit of setup issues and people 
tend to use the local executor instead. The discussion was parked as it needs 
to be discussed with a wider audience (mailing list, community) and is not 
something that we thin is required in the near term (obviously PRs are welcome).

Next, we discussed some of the scheduler issues that are marked in the attached 
document (https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg 
<https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). Core issues 
discussed were 1) TaskInstances can be created without a DagRun, 2) 
non-intuitive behavior with start_date and also depends_on_past and 3) Lineage. 
It was agreed that the proposal add a previous field to the DagRun model and to 
make backfills (a.o) use DagRun make sense. More discussion was around the 
lineage part as that involves more in depth changes to specifically 
TaskInstances. Still the consensus in the group was that it is necessary to 
make steps here and that they are long overdue.

Lastly, we discussed to draft scheduler roadmap (see doc) to see if there were 
any misalignments. While there are some differences in details we think the 
steps are quite compatible and the differences can be worked out.

So that was it, in case I missed anything correct me. In case of questions 
suggestions etc don’t hesitate and put them on the list. 
Cheers
Bolke

Reply via email to