Please specify what “stop doing its job” means. It doesn’t log anything anymore? If it does, the scheduler hasn’t died and hasn’t stopped.
B. > On 24 Mar 2017, at 18:20, Gael Magnan <gaelmag...@gmail.com> wrote: > > We encountered the same kind of problem with the scheduler that stopped > doing its job even after rebooting. I thought changing the start date or > the state of a task instance might be to blame but I've never been able to > pinpoint the problem either. > > We are using celery and docker if it helps. > > Le sam. 25 mars 2017 à 01:53, Bolke de Bruin <bdbr...@gmail.com> a écrit : > >> We are running *without* num runs for over a year (and never have). It is >> a very elusive issue which has not been reproducible. >> >> I like more info on this but it needs to be very elaborate even to the >> point of access to the system exposing the behavior. >> >> Bolke >> >> Sent from my iPhone >> >>> On 24 Mar 2017, at 16:04, Vijay Ramesh <vi...@change.org> wrote: >>> >>> We literally have a cron job that restarts the scheduler every 30 min. >> Num >>> runs didn't work consistently in rc4, sometimes it would restart itself >> and >>> sometimes we'd end up with a few zombie scheduler processes and things >>> would get stuck. Also running locally, without celery. >>> >>>> On Mar 24, 2017 16:02, <lro...@quartethealth.com> wrote: >>>> >>>> We have max runs set and still hit this. Our solution is dumber: >>>> monitoring log output, and kill the scheduler if it stops emitting. >> Works >>>> like a charm. >>>> >>>>> On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu <fhakan.ko...@gmail.com> >>>> wrote: >>>>> >>>>> Some solutions to this problem is restarting the scheduler frequently >> or >>>>> some sort of monitoring on the scheduler. We have set up a dag that >> pings >>>>> cronitor <https://cronitor.io/> (a dead man's snitch type of service) >>>> every >>>>> 10 minutes and the snitch pages you when the scheduler dies and does >> not >>>>> send a ping to it. >>>>> >>>>> On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips < >> aphill...@qrmedia.com> >>>>> wrote: >>>>> >>>>>> We use celery and run into it from time to time. >>>>>>> >>>>>> >>>>>> Bang goes my theory ;-) At least, assuming it's the same underlying >>>>>> cause... >>>>>> >>>>>> Regards >>>>>> >>>>>> ap >>>>>> >>>> >>