Things that might be needed for a correct multi-schedulers setup: * DAG-level lock while being evaluated * DAG-level lock expiration to recover from potential situation where the lock wasn't released * Accumulation of the list of task instances to run into the database (as opposed to cross process communication to master process) * Define a clear master cycle that would read the list of accumulated task instances from the DB, dedup, prioritize and schedule. That master cycle should have a lock (and lock expiration) as well.
Max On Mon, May 22, 2017 at 12:27 AM, Bolke de Bruin <[email protected]> wrote: > Hi Stephen, > > We are currently stress testing Airflow for use in a multi-master setup. > One of my team members is doing a write up that should show up online > shortly. TL;DR; in its current state Airflow will need some patches in > order to run concurrently. One issue is that Airflow can have a database > deadlock which will stop the scheduler from running. I have a patch for > that out here (https://github.com/apache/incubator-airflow/pull/2267 < > https://github.com/apache/incubator-airflow/pull/2267>) that works fine > on Postgres/MySql (tests don’t pass on sqlite yet due to limitations of > sqlite). > > Your global scheduler lock (eg. by an active passive configuration) might > make most sense for now. > > Bolke > > > On 22 May 2017, at 07:52, Stephen Rigney <[email protected]> wrote: > > > > Hi, > > > > We're running airflow in production, but for reliability (n.b. not > > performance) we'd like to confirm if it is safe to spawn multiple > instances > > of the scheduler overlapping in time (otherwise we may need to put more > > effort into assuring two copies aren't ever spawned at once in our > > environment). > > > > > > It seems this officially wasn't a supported configuration back in 2015 ( > > https://groups.google.com/d/msg/airbnb_airflow/-1wKa3OcwME/uATa8y3YDAAJ > ), > > but has sufficient intra-airflow locking been added that it is now safe > to > > start up two temporally overlapping instances of the scheduler for the > same > > airflow system? > > > > > > Or should we hack in a "global scheduler lock" - we're not looking for > > increased performance by scheduler parallelism, just that if we ever fire > > up two instances of the scheduler nothing terrible happens? > > > > > > Stephen > >
