We literally have a cron job that restarts the scheduler every 30 min. Num runs didn't work consistently in rc4, sometimes it would restart itself and sometimes we'd end up with a few zombie scheduler processes and things would get stuck. Also running locally, without celery.
On Mar 24, 2017 16:02, <lro...@quartethealth.com> wrote: > We have max runs set and still hit this. Our solution is dumber: > monitoring log output, and kill the scheduler if it stops emitting. Works > like a charm. > > > On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu <fhakan.ko...@gmail.com> > wrote: > > > > Some solutions to this problem is restarting the scheduler frequently or > > some sort of monitoring on the scheduler. We have set up a dag that pings > > cronitor <https://cronitor.io/> (a dead man's snitch type of service) > every > > 10 minutes and the snitch pages you when the scheduler dies and does not > > send a ping to it. > > > > On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips <aphill...@qrmedia.com> > > wrote: > > > >> We use celery and run into it from time to time. > >>> > >> > >> Bang goes my theory ;-) At least, assuming it's the same underlying > >> cause... > >> > >> Regards > >> > >> ap > >> >