We encountered the same kind of problem with the scheduler that stopped doing its job even after rebooting. I thought changing the start date or the state of a task instance might be to blame but I've never been able to pinpoint the problem either.
We are using celery and docker if it helps. Le sam. 25 mars 2017 à 01:53, Bolke de Bruin <bdbr...@gmail.com> a écrit : > We are running *without* num runs for over a year (and never have). It is > a very elusive issue which has not been reproducible. > > I like more info on this but it needs to be very elaborate even to the > point of access to the system exposing the behavior. > > Bolke > > Sent from my iPhone > > > On 24 Mar 2017, at 16:04, Vijay Ramesh <vi...@change.org> wrote: > > > > We literally have a cron job that restarts the scheduler every 30 min. > Num > > runs didn't work consistently in rc4, sometimes it would restart itself > and > > sometimes we'd end up with a few zombie scheduler processes and things > > would get stuck. Also running locally, without celery. > > > >> On Mar 24, 2017 16:02, <lro...@quartethealth.com> wrote: > >> > >> We have max runs set and still hit this. Our solution is dumber: > >> monitoring log output, and kill the scheduler if it stops emitting. > Works > >> like a charm. > >> > >>> On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu <fhakan.ko...@gmail.com> > >> wrote: > >>> > >>> Some solutions to this problem is restarting the scheduler frequently > or > >>> some sort of monitoring on the scheduler. We have set up a dag that > pings > >>> cronitor <https://cronitor.io/> (a dead man's snitch type of service) > >> every > >>> 10 minutes and the snitch pages you when the scheduler dies and does > not > >>> send a ping to it. > >>> > >>> On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips < > aphill...@qrmedia.com> > >>> wrote: > >>> > >>>> We use celery and run into it from time to time. > >>>>> > >>>> > >>>> Bang goes my theory ;-) At least, assuming it's the same underlying > >>>> cause... > >>>> > >>>> Regards > >>>> > >>>> ap > >>>> > >> >