Can you link us to some manual page for that utility? Thanks On Sat, Jul 12, 2014 at 8:45 PM, Matanya <[email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > Hi All, > The best tool to make sure your websever is running is monit. It will monitor > the service and bring it back up if it died. Furthermore, it can mail you on > every action taken and even have in the config file an option to declare how > many tries will try before giving up. Other features are available as well. > > On 12 ביולי 2014 15:00:37 GMT+03:00, [email protected] wrote: >>Send Labs-l mailing list submissions to >> [email protected] >> >>To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.wikimedia.org/mailman/listinfo/labs-l >>or, via email, send a message with subject or body 'help' to >> [email protected] >> >>You can reach the person managing the list at >> [email protected] >> >>When replying, please edit your Subject line so it is more specific >>than "Re: Contents of Labs-l digest..." >> >> >>Today's Topics: >> >> 1. On disk use (Marc A. Pelletier) >> 2. Re: Webservice (Petr Bena) >> 3. Re: Webservice (Marc-André Pelletier) >> >> >>---------------------------------------------------------------------- >> >>Message: 1 >>Date: Fri, 11 Jul 2014 12:23:53 -0400 >>From: "Marc A. Pelletier" <[email protected]> >>To: Wikimedia Labs <[email protected]> >>Subject: [Labs-l] On disk use >>Message-ID: <[email protected]> >>Content-Type: text/plain; charset=ISO-8859-1 >> >>Hey all. >> >>So, a quick reminder to every labs user: project space (/data/project) >>is on a networked drive. While it provides a lot of space and is >>conveniently accesible to all instances of a project, using it /does/ >>incur a performance cost. >> >>Whenever a service you are running on a labs instance needs to have >>/local/ storage (that is, does not need to share the data with other >>instances), it is generally preferable to _not_ use /data/project for >>it. Being careful about when you use this filesystem means improved >>performance and reliability for everyone. >> >>On Tool Labs, where your tools can be moved arbitrarily from one node >>to >>another, this mostly does not apply - you should be storing any >>persistent data in your tools' homes (which are on /data/project). It >>*is* possible to store data locally to the instance where the tool is >>running (for temporary data that does not need to persist from one run >>to another), provided you are careful about cleaning up after yourself. >> >>In any case, if you have any question about your disk usage or ways in >>which you can improve performance, don't hesitate to ask on-list or >>communicate with me by email or on IRC. >> >>-- Marc >> >> >> >>------------------------------ >> >>Message: 2 >>Date: Fri, 11 Jul 2014 20:30:25 +0200 >>From: Petr Bena <[email protected]> >>To: Wikimedia Labs <[email protected]> >>Subject: Re: [Labs-l] Webservice >>Message-ID: >> <ca+4eq5ftpwksqmrf7n8ix03zoh4pazgcobx_y0qhhmig5pe...@mail.gmail.com> >>Content-Type: text/plain; charset=UTF-8 >> >>nope, just once in minute, crontab doesn't handle seconds, it wouldn't >>fire up anything but the check if it's running >> >>On Fri, Jul 11, 2014 at 12:26 AM, Hasteur Wikipedia >><[email protected]> wrote: >>> Um... That's a very very bad idea. A crontab entry like that will >>fire multiple times a minute. What's the largest downtime that the >>service can tolerate? >>> >>> Sent from my iPhone >>> >>>> On Jul 10, 2014, at 4:52 PM, Petr Bena <[email protected]> wrote: >>>> >>>> what about appending this to crontab: >>>> >>>> * * * * * webservice start >>>> >>>>> On Thu, Jul 10, 2014 at 5:34 PM, Tim Landscheidt >><[email protected]> wrote: >>>>> Magnus Manske <[email protected]> wrote: >>>>> >>>>>> I've been manually restarting about a dozen webservices for my >>tools in the >>>>>> last 24h. >>>>> >>>>>> And before you say it, some of those were Hedonil's hand-rolled >>webservice. >>>>> >>>>>> Could we PLEASE either have a Labs-official, auto- and >>self-restarting >>>>>> webservice, or something a little more stable than lighttpd (or a >>more >>>>>> stable way to run it)? >>>>> >>>>> I looked at all the tools you are a developer of and I as- >>>>> sume you speak about wikidata-todo. This has some logs that >>>>> appear to have indications of OOM shutdowns. >>>>> >>>>> You use a custom lighttpd configuration, and I'm not sure if >>>>> the decision to have two PHP FCGIs doubles the memory re- >>>>> quirements, at the moment using 6 GBytes out of 7 GBytes re- >>>>> quested. >>>>> >>>>> What is clear however is that your PHP script: >>>>> >>>>> | 2014-07-10 14:11:39: (mod_fastcgi.c.2701) FastCGI-stderr: PHP >>Fatal error: Allowed memory size of 2621440000 bytes exhausted (tried >>to allocate 71 bytes) in >>/data/project/wikidata-todo/public_html/autolist2.php on line 201 >>>>> >>>>> uses almost 2.5 GByte of memory -- if I don't misread the >>>>> documentation -- per /request/. >>>>> >>>>> Memory is cheap and we could just increase the requested >>>>> limit, but I assume there are some PHP developers around who >>>>> might want to have a poke at optimizing >>>>> >><https://bitbucket.org/magnusmanske/wikidata-todo/src/master/public_html/autolist2.php>. >>>>> >>>>> Regarding self-restarting web services, with continuous jobs >>>>> we have a "while ! $JOB; do sleep 5; done" loop that ensures >>>>> that the job is restarted if it aborts. This however does >>>>> not work on OOMs that are the predominant cause of webser- >>>>> vice shutdowns, as the grid engine will kill the loop as >>>>> well :-). So we will probably have to start the webservice >>>>> and then start a watchdog job with the webservice's job num- >>>>> ber as its parameter that periodically checks that the web- >>>>> service is still running and, in case, restarts the webser- >>>>> vice. But to do that, jobs on execution nodes need to be >>>>> able to submit jobs, and this is still pending >>>>> (cf. https://bugzilla.wikimedia.org/54786). >>>>> >>>>> Tim >>>>> >>>>> >>>>> _______________________________________________ >>>>> Labs-l mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>>> >>>> _______________________________________________ >>>> Labs-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>> >>> _______________________________________________ >>> Labs-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/labs-l >> >> >> >>------------------------------ >> >>Message: 3 >>Date: Fri, 11 Jul 2014 15:12:46 -0400 >>From: Marc-André Pelletier <[email protected]> >>To: [email protected] >>Subject: Re: [Labs-l] Webservice >>Message-ID: <[email protected]> >>Content-Type: text/plain; charset=UTF-8 >> >>On 07/11/2014 02:30 PM, Petr Bena wrote: >>> nope, just once in minute, crontab doesn't handle seconds, it >>wouldn't >>> fire up anything but the check if it's running >> >>I'm currently looking at some system by which tool maintainers may >>specify automatic restart scripts that are sufficiently robust for >>general use. >> >>Amongst other requirements, it will send an obligatory email on every >>restart and will refuse to restart more than X times in a window of Y >>(where X and Y are still undecided but likely to be 3 and 24h). This >>is >>to prevent problematic tools from hammering on resources unattended. >> >>-- Marc >> >> >> >> >>------------------------------ >> >>_______________________________________________ >>Labs-l mailing list >>[email protected] >>https://lists.wikimedia.org/mailman/listinfo/labs-l >> >> >>End of Labs-l Digest, Vol 31, Issue 11 >>************************************** > -----BEGIN PGP SIGNATURE----- > Version: APG v1.1.1 > > iQJABAEBCgAqBQJTwYI6IxxNYXRhbnlhIE1vc2VzIDxtYXRhbnlhQGZvc3MuY28u > aWw+AAoJEKzSGXfsOI0veOEP/jXtutHDwYHzIzU/zQ6tHV89xufuD0+2pKVPzqmh > VyGclpkelV/JVjHvygnovJcluqfPA/smUTBn7YgwBtaT2ElEUoeCpim/ljOdxqLE > dAAgEt9JtoBytqJxZ5z6uQkoMK1k92xjUP8U9wp9ZYqrn7i89MkuxmdaUhXp0KlM > DmoqC+Cg1XxBk6Zq7wOQYLv3Lr5uSvUynvd3rCQI0wPlsWMr+B0r5nGV1zb+DWKZ > qzFYvUW7FONdxglde3vgrMhxcl2zWEtcHz0uh/ucMcSPcoUbiUi2Oy6tPZOwscEp > nCp9Yq0nBvD//GsA/hnKXp0sLK1VtFIv0cpufevZhdoV8/4+cy36bMToRx3KBiJg > WGxfjq85vEgC9yl2+2D9Pyuz2mK+bUrKcbVyQm0FVlOyIQAKQvJ4pqff8VzlLFsR > nYLRdE+RwjsG8k4hYDVqlln0KxlNT3ZbNErNJ84ndbEXVlWreM0tTWREcof3nn/x > 22ijokyz8F4AifUAo7AyYk7elJeSQVdizxptQSj0CtQXLa34R/wgwgHgxavhsPiH > Bwma4RHnVoVSZhpd1kgSOMy2lPTRK/Ww+FfwvJyZ9m18pGg0W7cm81JHDpC5BmEM > 58HXmhxvcCvq1snpmcdNrac30FzotYscryC/Fed6dP2lOLiXZsGGxTdKvuAb95Qa > zPlH > =YX2r > -----END PGP SIGNATURE----- > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
