Re: [Ganglia-developers] g3 plugin metadata?

Steven Wagner Wed, 22 Jan 2003 11:10:47 -0800

matt massie wrote:

steve-
you can see that the only metadata i've put in right now is plugin name,author and version (see test-plugin.c ganglia_main()). i welcome anyideas of more metadata that we need.

right now, i envision that we will have a service plugin directory (say/var/lib/ganglia/services). every file in that directory will be loadedat gmond (gserviced?) startup. each of the service plugins will then loadup all the data collect/publish plugins that it needs. keep in mind thata collect/publish module is not restricted on the number of metrics it canprocess.

So a service plug-in, in this particular case, would be something along thelines of "CPU monitoring," and would load a percentage collector, anumber-of-CPUs collector, possibly a temperature collector, etc. ?

I assume these plugins only have to be initialized once. Are we thinkingabout what happens when a monitored component is added/removed while thedaemon is running? I'm not talking CPUs here, but think about a disk monitor.

You have your (probably Linux-specific) disk monitor service. This checksout your attached devices and loads things like a SMART status monitorplug-in, a filesystem-per-disk metric, and so forth. Ganglia runs for awhile. Then the RAID array's taken offline to be rebuilt, or another oneis added.

Do the service or collector plug-ins support some form of messaging/eventmodel that would allow this to happen during the course of normal operationor would this involve some sort of SIGHUP-style daemon-kicking?

It's entirely possible that an individual collector could notice somethingthat requires a rescan by the other collectors in that service (the SCSImonitor notices a new disk just got added to the array and sends a "rescan"event to its parent disk monitoring service, to use the example above).

This same framework could allow an enterprising individual to write anotifier front-end that sends SNMP traps, e-mails, smoke signals, orupdates a display on the front of the box when certain events occur.

you can see how i changed the job scheduler. each job has a job-specificcollect and publish function now (see g3_job_t in g3.h). i needed to haveboth functions in each job (instead of linking them) in order for us tohave multiple service frontends.

That's what I saw yesterday, and it makes sense to me. But is each jobassociated with a single metric? Will a plug-in be able to share databetween its instances?

What I'm getting at is, if you have a job for monitoring each mounted localfilesystem, and they all use xfs_monitor.so, and there *isn't* a sharedmemory location for them all to stash the most recent results, then you'repolling $NUMBER_OF_PARTITIONS times more often than you need to be. Whichis programmatically gross and in some time-sensitive environs could beconstrued as bad.

And if each job resolves to a plug-in, and it's up to the plug-in to makethe metrics ... hmmm, I guess that answers all the questions that I'veactually raised up to this point. DOH! Except about the event model.

this also allows us to have push AND pull methods for publishing metrics.


This will make Lester very happy. :)

you'll see inetaddr.c tcp.c udp.c and mcast.c in the distribution now.g3 will have a full multicast, udp and tcp library to use in buildingthese services. i've compiled and tested the networking library on Linux,Solaris, FreeBSD, Cygwin and MacOS X.


When there's a front-end ready, *that's* when I'll start getting excited.

Is there any reason to make a g3 metadaemon? Wouldn't it be possible toimplement this as one or more front-end/service plug-ins?

.. i got off track there ..
back to the plugin question... if a plugin is compiled on a differentplatform than the one trying to load it then dlopen() will fail and wewon't even be able to get at the meta-data.


woohoo!

i think the question you are hitting on is this.. what is the bestapproach to building the plugins: platform-specific or metric-specific?platform-specific means that a developer builds a plugin which only workson a single target platform but has many metrics (this is more like ourapproach in the past).. OR.. should have a a metric-specific plugin (sayload) which only measures a single metric but works across a range ofplatforms. i think the first approach is best... ..

I think a combination is best, actually. There are some POSIX-y things outthere that we can monitor on anything. Not a lot, but it's something.Enough to encourage people to write their own stuff.

I'm talking about something like a uname plugin that works on a pretty widerange of systems. The MTU value, as well. There are a few instances inthe machine/*.c code where we've reinvented the wheel in several shapes andsizes. It would be nice to eliminate that.

But I do think that we should either arbitrarily decide for our own privatepurposes or publicly state the baseline metrics that we're workingtowards for all supported platforms. It doesn't seem to be verypublicly-known that Ganglia's metric output varies by platform. Maybe forg3 we should make a pretty chart that shows the metrics supported perplatform...

.. you know.. i just realized that i'm rambling on and on.. if you findanything useful in this message.. please feel free to reply..


Rambling is what developer lists are for!

Re: [Ganglia-developers] g3 plugin metadata?

Reply via email to