Re: [DISCUSS] AIP-78 scheduler-managed backfill

Jarek Potiuk Fri, 12 Jul 2024 09:43:16 -0700

> I think we're the unfortunate victims of overloaded terminology.  The
deadlock I refer to in the AIP is not a database deadlock.


I was absolutely certain it was about db deadlocks - also because they are
often mentioned in the context of running backfill - and having separate
scheduler loop and separate backfill loop was one of the likely reasons as
it was correlated with running backfills in a number of cases (the other
one was the mini-scheduler waiting for locks issue).

> Incidentally, along with many other params, I am proposing initially to
remove `task_regex`

I agree. It's a good candidate to remove - especially at the beginning to
remove initial complexity. And maybe add a different UI-driven control of
what gets backfilled (not regexp based) can be added later.

> And on scheduling in the same scheduler loop, I agree.  I did not propose
otherwise.  I was trying to avoid taking on a larger question about
scheduling priority.  I think that there have been some ideas around that
simmering for a while but sorta merits its own AIP

Yeah - that was not clear, first and that's what I called "downplaying" in
terms of potential impact it might have. I think we have to design it
consciously, also in conjunction with the answer to what will happen with
the mini-scheduler question for AIP-72.

I think that when we do join scheduler and backfill - taking into account
that scheduler works with scheduling batches in a scheduling loop as it
does currently, the risk of starvation (of either side - backfill vs
regular runs) is really high if we do not design the queries and sequence
properly - we add one more dimension to the current scheduling AND make it
easier to run backfills - even multiple in parallel - without the "admin"
person having any control over that. It could be limited to the admin of
course, but I think the whole idea to move it to the UI is to "democratise"
backfills - and make it possible for more "operations users" to run them.
With features like that I tend to think from an "actor" point of view.
Previously backfills could only be run by the "Airflow gurus" - i.e. those
who could actually login to Airlflow machine and run the magic cli command
- now backfill will be put in the hands of UI users - which might often be
more people and less aware of what they can do when they run backfill. That
will change the frequency and times on when backfills will be run I think
(more frequently and more in parallel with regular runs).

> I think mini scheduler is sort of unrelated because my expectation is that
once tasks are created they will be managed same as other tasks.

It's somehow related (but yes -it's more AIP-72 question). Mini scheduler
currently **actually** attempts to lock the DagRun table when it runs -
this is precisely what has been recently made as "optionally skipped" when
mini-scheduler could not obtain the lock immediately - because it wreak all
kind of havoc with mapped tasks:
https://github.com/apache/airflow/pull/39745 , and this is what backfill
scheduling will also attempt to lock - so I think it's very much related to
how this plays "together"

            dag_run = with_row_locks(
                session.query(DagRun).filter_by(
                    dag_id=ti.dag_id,
                    run_id=ti.run_id,
                ),
                session=session,
                nowait=True,
            ).one()

turned into:

            # Re-select the row with a lock
            dag_run = with_row_locks(
                session.query(DagRun).filter_by(
                    dag_id=ti.dag_id,
                    run_id=ti.run_id,
                ),
                session=session,
                skip_locked=True,
            ).one_or_none()

            if not dag_run:
                cls.logger().debug("Skip locked rows, rollback")
                session.rollback()

It was not a deadlock "per se" - it was "The Postgres instance started
reporting many could not obtain lock on row in relation "dag_run" errors:"
- but the root cause was that there was many parallel "clients" attempting
to lock the same DagRun Table.

Ideally I think we should only have ONE scheduling loop per scheduler with
a very clear SKIP_LOCKED approach for DagRun - also with AIP-72 I **think**
remote locking of the DB as currently done with mini-scheduler as done
currently will not even be possible -  but maybe Ash has more to say on it.

J,






On Fri, Jul 12, 2024 at 5:14 PM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> >
> > But regarding the scheduler pressure, I think I have a bit of a
> > different observation and the deadlock problem which is a bit
> > downplayed in the current proposal is - IMHO - crucial problem to be
> > solved when we want to make backfilling more accessible
> >
>
> I think we're the unfortunate victims of overloaded terminology.  The
> deadlock I refer to in the AIP is not a database deadlock. The word choice
> is not ideal, but that's what it's currently called in the code.  It is
> referring to the scenario where no tasks can be scheduled anymore -- not
> due to database deadlock, but because somehow the backfill process got into
> a scenario where no tasks can be scheduled.  For example see this comment
> <
> https://github.com/apache/airflow/blob/c09fcdf1d0e69497cf1b628df9ba3349eb688256/airflow/jobs/backfill_job_runner.py#L496-L499
> >
> and
> this comment
> <
> https://github.com/apache/airflow/blob/c09fcdf1d0e69497cf1b628df9ba3349eb688256/airflow/jobs/backfill_job_runner.py#L730-L732
> >.
> It's not really clear exactly how this happens.  But I have a suspicion
> that it's probably more common when we use params like `task_regex` to run
> backfill on only a subset of a a dag.  And in that scenario, it's easier to
> imagine weird things happening.
>
> Incidentally, along with many other params, I am proposing initially to
> remove `task_regex`, and this is mentioned in the doc.  My thought is that,
> I don't really like having to deal with that complexity and it's just
> confusing, not super well-defined behavior, and I suspect it is the likely
> cause of the deadlock concerns that can be found in the code.  And I
> figured to just lead with that in the proposal and see if anyone has any
> objections.
>
> I think we should make sure that backfill runs are scheduled and
> > queued in the executor in the same "scheduler loop" or get a better
> > mechanism to avoid deadlocks.
>
>
> Yeah same thing, this is a different kind of deadlock.  And on scheduling
> in the same scheduler loop, I agree.  I did not propose otherwise.  Indeed
> part of the motivator is to reduce complexity and "two ways of doing
> things", and if I write a second scheduler *in the scheduler* for this I
> would not be doing that!  But, there will still be *more* for the scheduler
> to do.  I think the main thing will be, essentially, creating the dag
> runs.  I think there will need to be a different path for creating the dag
> runs, but once they are created, I expect the task instances would be
> managed the same as any other task instance.
>
> There are other comments about deadlocks but again it's a different thing
> from what I call out in the proposal, and I think I agree with you and I
> think we are on same page -- the tasks in a backfill should be handled in
> the same mechanisms as normal tasks and therefore have the same
> *database* deadlock
> risks.
>
> But that also requires some
> > mechanism to avoid starvation - for example we should only allow say
> > max 30% of runs scheduled and queued within a single scheduler loop to
> > be backfils.
>
>
> I was trying to avoid taking on a larger question about scheduling
> priority.  I think that there have been some ideas around that simmering
> for a while but sorta merits its own AIP.  That said, one limit mechanism
> available already that I do not propose to remove is using a pool.
> Interestingly right now backfill does not seem to respect the defined pool
> at all, though it does have an optional pool argument.  I think we should
> keep pool, but apply it as an optional override -- so the defined pool will
> be respected unless it is overridden in backfill configuration.  So a user
> would be able to run backfill tasks in a pool that limited their
> concurrency.
>
> One problem with things like priority weight in airflow are that they are
> not forward-looking and only evaluated in the current scheduler loop with
> the tasks at hand.  So you might schedule a bunch of low priority things
> cus that's all that's there now to schedule; but in 2 minutes the highest
> priority thing comes up and now it can't be scheduled.  This sort of
> complexity is why I was hoping to avoid folding it in to this AIP, which I
> think has enough on its plate.
>
> I am open to adding some simple concurrency controls for backfill.  I think
> it's a reasonable idea.  But I'm not sure exactly what would be the best
> thing. But I expect some thoughts on this will materialize throughout the
> course of the AIP's implementation.  And I do make small mention of this at
> the bottom of the doc, that it's under consideration.  But for now it is
> essentially (1) pool overrides and (2) pausing the backfill.
>
> I think it's crucial to design and describe how the "looping" process
> > should look
> > like for scheduler, whether we continue having mini-scheduler and how
> > backfill scheduling processing should look like.
>
>
> I think mini scheduler is sort of unrelated because my expectation is that
> once tasks are created they will be managed same as other tasks.
>
> At high level, my thought is that the "backfill part" of the scheduler will
> be in the dag run creation, and then where it comes to queuing and managing
> tasks, it would be handled by the normal process which should remain more
> or less unchanged.  I can add this to the AIP doc.
>

Re: [DISCUSS] AIP-78 scheduler-managed backfill

Reply via email to