I've been thinking is something like this applied on SGE, a tool to monitor when jobs are launched, how much time did it runned, mem and other relevant data. When we've got a lot of scripts and many of them have a simple log system, detecting that something is working wrong or not working at all is quite painful. A basic output like xml or similar should be simple enough to tool authors get some script and create a small-tiny-micro-nano monitor tool adapted for they're needs.
2014-05-30 18:39 GMT+01:00 Ryan Lane <[email protected]>: > Yeah, that seems like a good idea. In general projects always have the > option on setting up more specific tools inside of their own project if the > general tool can't easily meet their needs. > > > On Fri, May 30, 2014 at 7:30 AM, Yuvi Panda <[email protected]> wrote: > >> Hello! >> >> I know that we have some form of icinga for Labs in general, but it is >> currently pretty dysfunctional and not very useful. Wondering if we >> should setup a separate icinga just for toollabs that not just >> provides general monitoring for admins, but also monitoring for >> individual tools (in a way that's easily customizable by he tool >> authors themselves). I wrote up a proposal for this a while ago >> (https://wikitech.wikimedia.org/wiki/User:Yuvipanda/Icinga_for_tools). >> I think this will help improve general reliability of all our tools >> and infrastructure. >> >> Thoughts? >> >> -- >> Yuvi Panda T >> http://yuvi.in/blog >> >> _______________________________________________ >> Labs-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/labs-l >> > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l > > -- Alchimista http://pt.wikipedia.org/wiki/Utilizador:Alchimista
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
