Go with hedonil's scripts.  They're very good.

Gesendet von Maximilian's iPhone. 
(Sent from Maximilian's iPhone.)

> On Jun 10, 2014, at 06:36, Magnus Manske <[email protected]> wrote:
> 
> As the maintainer of several dozen tools, this happens on a regular basis. No 
> automatic notification, nor automatic restart. Pitiful, really.
> 
> Hedonil has written a set of scripts to run the webservice in a more reliable 
> manner, and even has an "auto-restarter", which I use for some of the tools 
> where the standard webservice used to die on an almost daily basis.
> 
> Tools Labs should really improve this.
> 
> 
>> On Tue, Jun 10, 2014 at 10:28 AM, Merlijn van Deen <[email protected]> 
>> wrote:
>> Hello all,
>> 
>> My 'tsreports' webservice randomly dies every now and then. qacct suggests 
>> this is due to OOM:
>> 
>> tools.tsreports@tools-login:~$ qacct -j 487745
>> qname        webgrid-lighttpd
>> (...)
>> jobname      lighttpd-tsreports
>> jobnumber    487745
>> (...)
>> qsub_time    Wed Apr 23 08:18:12 2014
>> start_time   Fri May 23 14:30:17 2014
>> end_time     Fri Jun  6 10:51:21 2014
>> (...)
>> failed       0
>> exit_status  0
>> (...)
>> maxvmem      3.973G
>> 
>> 
>> I have no clue how to debug this, though; the lighttpd error log just shows
>> 
>> 2014-06-06 10:51:20: (mod_fastcgi.c.3061) got proc: pid: 12119 socket: 
>> unix:/tmp/tsreports-index.fcgi.sock-0 load: 1
>> 2014-06-06 10:51:20: (server.c.1512) server stopped by UID = 0 PID = 12087
>> 2014-06-06 10:51:20: (server.c.1502) unlink failed for: 
>> /var/run/lighttpd/tsreports.pid 2 No such file or directory
>> 2014-06-06 10:51:20: (server.c.1512) server stopped by UID = 0 PID = 12087
>> 2014-06-06 10:51:20: (server.c.1502) unlink failed for: 
>> /var/run/lighttpd/tsreports.pid 2 No such file or directory
>> 2014-06-06 10:51:20: (server.c.1502) unlink failed for: 
>> /var/run/lighttpd/tsreports.pid 2 No such file or directory
>> 2014-06-06 10:51:20: (server.c.1512) server stopped by UID = 0 PID = 12087
>> 2014-06-06 10:51:21: (server.c.1502) unlink failed for: 
>> /var/run/lighttpd/tsreports.pid 2 No such file or directory
>> 2014-06-06 10:51:21: (server.c.1512) server stopped by UID = 0 PID = 12087
>> 2014-06-06 10:51:20: (server.c.1512) server stopped by UID = 0 PID = 12087
>> 
>> which is not very informative, to say the least.
>> 
>> So: how can one debug these issues?
>> 
>> To add insult to the injury, SGE doesn't even send an e-mail to tell me it 
>> killed the webserver, nor does it re-start the webserver. Either of those 
>> would be reasonable (especially the option 'restart the webserver'). Now I 
>> had to be notified by someone on my talk page...
>> 
>> Merlijn
>> 
>> _______________________________________________
>> Labs-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>> 
> 
> 
> 
> -- 
> undefined
> _______________________________________________
> Labs-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/labs-l
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to