Things that might be needed for a correct multi-schedulers setup:
* DAG-level lock while being evaluated
* DAG-level lock expiration to recover from potential situation where the
lock wasn't released
* Accumulation of the list of task instances to run into the database (as
opposed to cross process communication to master process)
* Define a clear master cycle that would read the list of accumulated task
instances from the DB, dedup, prioritize and schedule. That master cycle
should have a lock (and lock expiration) as well.

Max

On Mon, May 22, 2017 at 12:27 AM, Bolke de Bruin <[email protected]> wrote:

> Hi Stephen,
>
> We are currently stress testing Airflow for use in a multi-master setup.
> One of my team members is doing a write up that should show up online
> shortly. TL;DR; in its current state Airflow will need some patches in
> order to run concurrently. One issue is that Airflow can have a database
> deadlock which will stop the scheduler from running. I have a patch for
> that out here (https://github.com/apache/incubator-airflow/pull/2267 <
> https://github.com/apache/incubator-airflow/pull/2267>) that works fine
> on Postgres/MySql (tests don’t pass on sqlite yet due to limitations of
> sqlite).
>
> Your global scheduler lock (eg. by an active passive configuration) might
> make most sense for now.
>
> Bolke
>
> > On 22 May 2017, at 07:52, Stephen Rigney <[email protected]> wrote:
> >
> > Hi,
> >
> > We're running airflow in production, but for reliability (n.b. not
> > performance) we'd like to confirm if it is safe to spawn multiple
> instances
> > of the scheduler overlapping in time (otherwise we may need to put more
> > effort into assuring two copies aren't ever spawned at once in our
> > environment).
> >
> >
> > It seems this officially wasn't a supported configuration back in 2015 (
> > https://groups.google.com/d/msg/airbnb_airflow/-1wKa3OcwME/uATa8y3YDAAJ
> ),
> > but has sufficient intra-airflow locking been added that it is now safe
> to
> > start up two temporally overlapping instances of the scheduler for the
> same
> > airflow system?
> >
> >
> > Or should we hack in a "global scheduler lock" - we're not looking for
> > increased performance by scheduler parallelism, just that if we ever fire
> > up two instances of the scheduler nothing terrible happens?
> >
> >
> > Stephen
>
>

Reply via email to