As I see it, minischeduler is unrelated and not what this AIP is about, and
basically does not matter for this AIP.  And the broader discussion about
locking, while interesting, is also beyond this AIP and I'm not sure it
makes sense to have here.  Side note, there is currently locking in more
than one place in airflow, and of more than one type of entity; 23 usages
in core of with_row_locks helper.  Some in dag processor; some, I assume,
required for some API interactions.

Don't read that as me not wanting to provide details, I just want to be
clear about what is actually relevant for this AIP and what is really a
separate discussion and also a hope that, for the in-scope things, that we
can just be clear about specifically what are the concerns that need to be
addressed and the questions that need to be answered.

Backfill, as I understand it, is fundamentally about creating dag runs.
>From the scheduling perspective, it's not much different from "normal" dag
runs -- it's just that they are created with old execution dates.  Once the
runs are created, my thinking is, the task instances of backfill runs can
be processed in the same way as non-backfill runs.  This is what I
propose.  Do we have agreement on this?

Regarding how the dag runs are created, there's a class method on the DAG
object `dags_needing_dagruns`.  This is currently where we identify the
dags needing scheduled runs as well as dags needing dataset triggered
runs.  And the the dag table is locked as part of this.  I suspect this
would be a reasonable place to identify dags needing backfill runs as
well.  Does that sound reasonable to you?

What other open questions are there re scheduler?

Reply via email to