Can I assume your db is also not a SPOF? It seems a waste of processor time to me to have it every node.
B. Sent from my iPhone > On 24 Feb 2017, at 20:39, Jason Chen <[email protected]> wrote: > > Thanks, Arthur. > > I think it avoids "single point of failure" (and maybe balance the > scheduling load between nodes ?) > Given a celery cluster, if the only node running scheduler is down, the > whole cluster will fail to schedule jobs. > Any downside why not having multiple schedulers ? > > -Jason > > > On Fri, Feb 24, 2017 at 11:35 AM, Arthur Wiedmer <[email protected]> > wrote: > >> Jason, >> >> Why do you need a scheduler running on each node? >> >> We have a single scheduler powering work on many nodes each running a >> celery worker via "airflow worker". We have one large metadata MySQL >> instance. >> >> Best, >> Arthur >> >> On Fri, Feb 24, 2017 at 11:04 AM, Jason Chen <[email protected]> >> wrote: >> >>> A side question related to this topic: >>> I am running Airflow w/ celery executor in multiple nodes. Each node is >>> running celery, worker, scheduler and webserver. >>> These nodes are registered to a Redis for celery queue and these nodes >> are >>> sharing the same dags, logs folder (and MySQL) >>> It seems running fine. >>> Any concerns or suggestions ? >>> I am thinking celery executor is designed for distributed env. >>> >>> Thanks. >>> >>> -Jason >>> >>> >>> On Fri, Feb 24, 2017 at 10:58 AM, Jason Jho <[email protected]. >>> invalid >>>> wrote: >>> >>>> Seems like this would inherently tied to the VM it's running on. Either >>>> way, would love to hear about any experiences as well! >>>> On Fri, Feb 24, 2017 at 1:52 PM Wilson Lian <[email protected] >>> >>>> wrote: >>>> >>>>> Out of curiosity, has anyone heard any war stories re: reaching the >>>> limits >>>>> of a single scheduler in terms of the number of >> potentially-schedulable >>>>> DAGs? >>>>> >>>>> On Fri, Feb 24, 2017 at 10:25 AM, Dan Davydov < >>>>> [email protected]> wrote: >>>>> >>>>>> We just had two running by accident for some period of time. >>>>>> >>>>>> On Feb 24, 2017 5:52 AM, "Jason Jho" <[email protected]. >>> invalid> >>>>>> wrote: >>>>>> >>>>>>> Hi Dan / Sid, >>>>>>> >>>>>>> Would you be able to elaborate on the multiple scheduler setup? >>>> Curious >>>>>> how >>>>>>> that would have been deployed. Was the purpose to have some kind >> of >>>>>>> failover or to distribute execution of jobs? >>>>>>> >>>>>>> Thanks! >>>>>>> On Fri, Feb 24, 2017 at 3:49 AM Dan Davydov < >>> [email protected]. >>>>>>> invalid> >>>>>>> wrote: >>>>>>> >>>>>>>> Fwiw Airbnb was running multiple schedulers for a short while >> on >>>>> 1.7.1 >>>>>>> and >>>>>>>> we didn't seem to have issues. >>>>>>>> >>>>>>>> On Feb 24, 2017 12:25 AM, "Bolke de Bruin" <[email protected]> >>>>> wrote: >>>>>>>> >>>>>>>>> While I agree with the assessment of Sid that a lot has >> changed >>>> and >>>>>> we >>>>>>> do >>>>>>>>> not officially test on multiple schedulers, many changes were >>> in >>>>> the >>>>>>> area >>>>>>>>> of proper locking which benefit multiple schedulers. In >>> addition >>>>> the >>>>>>>> tasks >>>>>>>>> themselves have built in checks that they don’t run twice at >>> the >>>>> same >>>>>>>> time. >>>>>>>>> >>>>>>>>> Yet YMMV. >>>>>>>>> >>>>>>>>> Bolke >>>>>>>>> >>>>>>>>>> On 24 Feb 2017, at 03:13, siddharth anand < >> [email protected] >>>> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I did run 2 or more schedulers with Local Executors up >> until >>>> mid >>>>>>> last >>>>>>>>>> year. There have been enough changes to the code and >> feature >>>>>>> additions >>>>>>>>> that >>>>>>>>>> I don't think this is a recommended practice at this point. >>>> Also, >>>>>>> there >>>>>>>>> is >>>>>>>>>> not a lot of synchronization in the scheduler to ensure >> this >>>> will >>>>>>> work. >>>>>>>>>> >>>>>>>>>> -s >>>>>>>>>> >>>>>>>>>> On Thu, Feb 9, 2017 at 6:47 AM, matus valo < >>>> [email protected]> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am considering deployment of airflow as pipeline >>> framework. >>>> I >>>>>> have >>>>>>>>> found >>>>>>>>>>> out multiple articles explaining deployment of airflow in >>>>>>> distributed >>>>>>>>>>> environment (e.g. [1]). Unfortunately, I was not able to >>> find >>>>> out >>>>>>> any >>>>>>>>> use >>>>>>>>>>> case where scheduler is deployed distributed on multiple >>>> nodes. >>>>> Is >>>>>>> it >>>>>>>>>>> possible to have scheduler distributed on multiple nodes >> to >>>>>> prevent >>>>>>>>> single >>>>>>>>>>> point of failure? I haven’t found any mention about it in >>>>>>>>> documentation. I >>>>>>>>>>> have found out in [2] that it is not possible but on the >>> other >>>>>> hand >>>>>>> in >>>>>>>>> [3] >>>>>>>>>>> is reference that this can be solved in new version of >>>> airflow. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Matus >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1] http://site.clairvoyantsoft. >> com/setting-apache-airflow- >>>>>> cluster/ >>>>>>>>>>> >>>>>>>>>>> [2] >>>>>>>> https://groups.google.com/forum/#!topic/airbnb_airflow/- >>> 1wKa3OcwME >>>>>>>>>>> >>>>>>>>>>> [3] https://issues.apache.org/jira/browse/AIRFLOW-678 >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>
