Brad, first of thank you very much for your elaborate reply !
Please, have a look at my comments inline blow. Matthias On Mon, 14 Jan 2008, Brad Nicholes wrote: > >>> On 1/12/2008 at 11:00 AM, in message > <[EMAIL PROTECTED]>, Matthias > Blankenhaus <[EMAIL PROTECTED]> wrote: > > Hi Brad ! > > > > I started looking into the impl of a IPMI Python DSO. Since one of the > > big advantages of IPMI is out-of-band monitoring, I would need a function > > that returns the list of nodes reporting to a gmond instance. I am > > willing to extend the DSO stuff, if you could please give me a pointer. > > > > If I understand correctly what you are looking for, you need a function that > will return a list of all of the other gmond nodes that have reported > metrics. More specifically, I want to have a list of hosts that are known to a specific gmond instance. This should work _hopefully_ for all Ganglia layouts. E.g., if the admin uses the multi-casting then I suppose every gmond instance knows about every host in the Ganglia multi-casting domain. On the hand, if the admin configures a unicast network with a set of gmonds pointing to one gmond then that one gmond should have a list of gmond hosts talking to it. > The only problem is that a function like this will only work if all of > the gmond nodes are reporting their metrics via a multicast connection. > If the gmond nodes are reporting in a hierarchical manner where all nodes > report to a single controlling node via a unicast connection, the only > node that has the information about all of the other nodes is the single > controlling node. Which is fine, because the admin will then only configure that one gmond with the IPMI module. In fact, I'd say that even in the multicast realm the admin should only pick one gmond instance that is responsible for IPMI monitoring. Calling the ipmitool is expensive, thus I would attempt to minimize the monitoring impact on the cluster. > The other problem is that the hosts data is declared and stored within the > gmond binary itself and the only what to get at the hosts data is through > a gmond external API. Currently gmond doesn't have any external APIs > that can be called directly from a module or anything else. All > external calls are in libganglia. Gmond only makes calls out to other > modules, > nothing makes any calls directly back into gmond. One way to get around > this would be to add another point in the mmodule_struct in metric.h > that holds a readonly list of hosts. The list of hosts would have to be > maintained > by gmond when a new host is detected. The a modules main interface with > gmond is through the mmodule_struct, it will also hold the pointer to > the hosts list and could just read it whenever it needs to. > Module access to the configuration file data was done similar to this. > In other words, gmond stuffed the pointer to the configuration file > handle into the mmodule_struct so that modules could read the > configuration file directly. Understood. > > Anyway, you should be able to get the information you need by walking the > 'hosts' > apr_hash_t global variable. What exactly is this APR stuff ? I do understand your stmts about the hash table. > This hash table stores a collection of 'Ganglia_host' structures which is > defined > at the top of gmond.c. There are a couple of examples of how to walk > the hash table in process_tcp_accept_channel() and cleanup_data() functions. > The easiest thing to do would probably be to create a function that > walks the 'hosts' hash table and creates an apr_array_t. If a pointer to the > apr_array_t > that contains the hosts is stuffed into the mmodule_struct when a module > is loaded, then whenever the apr_array_t is updated, all modules will > have access to the information. >From below I understand that this however excludes momentarily Python modules, right ? Does this mean that C moduls can access it ? If so, should I then rather implement this in C ? fine with me :-) > I don't necessarily like changing the > mmodule_struct structure whenever things like this come up because that > structure will have to be locked down when we ship 3.1.0. But since we > haven't shipped 3.1.0 yet, the structure can still be considered > flexible. I share your concern of changing that structure. However, as you've mentioned if there is a good time to do this it's now. > > The only problem from this point is how to expose the same host data through > to a python module. Except for a return value, the communication from > gmond through mod_python to a python module, is one way. Yeah, that's what I figured. > Python modules don't have any way of calling back into gmond for additional > information. > The mmodule_struct is not exposed directly to a python module either so > the same trick of passing a common pointer through a structure won't > work. The only way to do it would be to add another parameter to the > handler call when mod_python calls into the python module. But this > seems kind of messy because 99% of the time for other python modules, > that parameter will complete unnecessary. You've mentioned above the one-way communication going from gmond to a Python module. How about we introduce another optional callback that takes as a parameter the host list, thus picking up your idea. Then whenver gmond updates the host list in mmodule_struct it calls this optional method on all python modules ? Of course, if we would some day end up with > 100 modules that might become a performance problem. However, the callbacks could be invoked asynchronously from a dedicated gmond thread. > Bottomline is, I'm not sure > what a good solution is for this whole thing. > > > > Also, I think it would be useful if the DSO module could unload itself. > > As I understand all modules that are under > > /usr/lib/ganglia/python_modules/ are automatically loaded. However, if > > e.g. the IPMI module determines that it can't function because of missing > > or unfitting (wrong version) SW pieces, then it should unload itself. > > > > What say you ? > > > > I think that providing a python module with the ability to unload itself is a > great idea. > Basically the metric_init function would need to return some kind of > indications that it wants to be unloaded. Since the metric_init function > returns a list of dictionaries that contain the metric definitions that > the module provides, probably the best way to indicate that it wants to > be unloaded would be to return an empty list. If mod_python detects an > empty list, then it would just unload the module and continue on. I > could see this being useful by allowing the python module to detect if > any of the metrics that it provides have been referenced in the > configuration file. If not then just unload itself. I like this implementation. Do we need to allocate a NULL parameter set for modules that do not expose metrics and only implement a side effect ? Is this every conceivable ? > > Let me know if you have any more questions, > > Brad > > > ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
