Hello, I just realized gmond is even better than I thought. It's threaded? I wrote a python plugin with a callback that repeatedly sleeps forever... But this didn't stop the other plugin callbacks from running. Ganglia makes me happy...
I like using gmetric to monitor... so I wrote gmetric-daemon which is my attempt at a forking standalone daemon which runs Python metric modules and calls gmetric for each metric... I'm going to try and use this framework to monitor the cluster at work. I'm fairly new to Ganglia but I suspect some other users probably wrote something like this... or maybe use crontabs. I wanted a slightly different multithreaded approach to monitoring... but it turns out that Python threads really suck. So I made this a forking daemon. One process per module. Not very memory effecient. But then I don't expect to need many modules... I like writing Python scripts that call gmetric... its easy. I'll soon be writing some more metric monitors for my work. Here is the gitHub repo if anyone is interested : http://github.com/david415/gmetric-daemon/tree/master by the way... it's a rough draft at best right now. -- Operations Engineer Spinn3r.com Location: San Francisco, CA YIM: mrdavids22 Work: http://spinn3r.com Blog: http://david415.wordpress.com On Wed, Jan 28, 2009 at 2:28 PM, David Stainton <dstainton...@gmail.com> wrote: > Greetings, > > > Gilad, if you are going to rewrite your mysql python module to use threads... > you might want to think more about the race conditions. > I'll use your very useful mysql module as an example of how the python > module interface is fundamentally flawed by design. > Multiple metrics are provided by a single blocking query to mysqld > (e.g. SHOW INNODB STATUS) therefore the programmer of > the python module should be in control of data collection scheduling. > > With the current module interface one might think to spawn a collector > thread to run continuously and populate a cache... > and have each callback function read an element asynchronously from the cache. > This design is flawed because now there are two schedulers! > The collector thread must schedule it's data collection and the > .pyconf file must also > tell the gmond scheduler when to collect data from the cache. > It's easy to see how it would be difficult or impossible to keep these > two parallel schedulers aligned > so that there are no race conditions preventing a consistent view of the data. > > I do not think there is a way to utilize the current python module > interface to write modules which correctly handle all > edge cases for many real world problems (e.g. monitoring databases > with blocking queries which return multiple metrics worth of data). > > I agree, writing a Python script that calls gmetric is easier. > But I think the situation is deceptive and I'm looking to make things > scale well and be highly available/reliable. > It is not just inconvenient to have to have a .pyconf per module... > It's also a design flaw because it implies two parallel schedulers. > > I'd suggest rewriting the python module interface. > The programmer utilizing this interface that I'm imagining, > could easily write python modules that correctly handle multiple > metrics and blocking calls. > > The user of this API would write a single data collection function > which returns a tuple of metrics (metric meta data and metric value) > and scheduling info (e.g. the number of seconds later that this > function should be called). > Gmond would spawn a thread for each module. > Each module thread runs the collector function supplied to it. > When the collector function returns... the thread should somehow (??) update > the metric data structures (like using gmetric). The collector > function also returns > scheduling information... for example how long the module thread should sleep > before calling the collector function again. Collector functions would > measure how long data collection takes > and use that information to schedule the next data collection. > > At this time it seems easier to write a deamon which makes calls to > gmetric and correctly handles multiple blocking > calls which collect data for multiple metrics. I'd make sure that the > daemon spawns a thread for each blocking collector (e.g. module). > > The messier equivalent would be to write smaller scripts. Each script > has a blocking collector which then reports all the metrics > via calls to gmetric. Each script is run in parallel; via cron or > whatever parallel execution scheduler... > > Spawning python threads is obviously more memory efficient than > forking many procs... > But my point here is the equivalent scheduling. > > I'm going to write the threading daemon (in python) > because it seems like the easiest, most scalable, most correct and > most reliable/highly available design I can think of. > It should make writing a module very easy and quick which is what it should > be. > We shouldn't have to think about threading and race conditions if all > I want is a simple module. > Of course it'd be cleaner if I didn't have to popen gmetric. > But I'd rather have my work be reliable and able scale to many modules > and metrics. > > Thoughts anyone? > > Cheers, > > David Stainton > > -- > Operations Engineer Spinn3r.com > Location: San Francisco, CA > YIM: mrdavids22 > Work: http://spinn3r.com > Blog: http://david415.wordpress.com > -- Operations Engineer Spinn3r.com Location: San Francisco, CA YIM: mrdavids22 Work: http://spinn3r.com Blog: http://david415.wordpress.com ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers