what about appending this to crontab: * * * * * webservice start
On Thu, Jul 10, 2014 at 5:34 PM, Tim Landscheidt <[email protected]> wrote: > Magnus Manske <[email protected]> wrote: > >> I've been manually restarting about a dozen webservices for my tools in the >> last 24h. > >> And before you say it, some of those were Hedonil's hand-rolled webservice. > >> Could we PLEASE either have a Labs-official, auto- and self-restarting >> webservice, or something a little more stable than lighttpd (or a more >> stable way to run it)? > > I looked at all the tools you are a developer of and I as- > sume you speak about wikidata-todo. This has some logs that > appear to have indications of OOM shutdowns. > > You use a custom lighttpd configuration, and I'm not sure if > the decision to have two PHP FCGIs doubles the memory re- > quirements, at the moment using 6 GBytes out of 7 GBytes re- > quested. > > What is clear however is that your PHP script: > > | 2014-07-10 14:11:39: (mod_fastcgi.c.2701) FastCGI-stderr: PHP Fatal error: > Allowed memory size of 2621440000 bytes exhausted (tried to allocate 71 > bytes) in /data/project/wikidata-todo/public_html/autolist2.php on line 201 > > uses almost 2.5 GByte of memory -- if I don't misread the > documentation -- per /request/. > > Memory is cheap and we could just increase the requested > limit, but I assume there are some PHP developers around who > might want to have a poke at optimizing > <https://bitbucket.org/magnusmanske/wikidata-todo/src/master/public_html/autolist2.php>. > > Regarding self-restarting web services, with continuous jobs > we have a "while ! $JOB; do sleep 5; done" loop that ensures > that the job is restarted if it aborts. This however does > not work on OOMs that are the predominant cause of webser- > vice shutdowns, as the grid engine will kill the loop as > well :-). So we will probably have to start the webservice > and then start a watchdog job with the webservice's job num- > ber as its parameter that periodically checks that the web- > service is still running and, in case, restarts the webser- > vice. But to do that, jobs on execution nodes need to be > able to submit jobs, and this is still pending > (cf. https://bugzilla.wikimedia.org/54786). > > Tim > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
