We did a fork of 3.0.4 that just uses a simple "while xdr_decode()..." loop to handle multiple metrics. It works pretty well. One additional trick we applied was allowing the client to send timestamped messages. While that adds a dependency (your machines have to be reliably time-synchronized), it allows our client code (based on a pretty simple Ruby class) to fill a UDP packet before transmitting. For clients that don't send time-stamped messages (e.g., gmond itself), we send a metric group at a time. We took our multicast packet volume down by a factor of about 20 doing this, and it allowed us to more than triple the number of metrics per node we run. The Ruby class sends the packets directly using a socket interface.
Together, we find this architecture far more compelling than the "plug-in" model, since failures or delays in a specific monitor (say, an application monitoring daemon) don't cause failures of the entire monitoring service for a node. It's also far simpler to deal with process-parallel monitors than thread-parallel ones. However, it's non-forward compatible; unmodified gmond's ignore all metrics after the first one in the packet (and even that one if it's timestamped). Also, we still have a mysterious leak in our gmond (which seems to be related to spoofed metrics, not the bundled-metrics), so I've never tried submitting the patches. And it wouldn't work for the new format in 3.1.x anyway. I've meant to get around to re-doing the patches for 3.1, but never have been able to allocate the time. I'm particularly interested in your 150K number. We don't have any single cluster that goes that high, but I have found that that's about as far as I can take a gmetad (grid), regardless of the number of underlying clusters (I've used ranges from four to 24). Somewhere in the 120 - 150K range, we start having gaps at the cluster metric & grid metric level; gaps usually form at the grid metric before the cluster metric. Since gmetad has a thread per cluster, I've always thought the gap was being caused by the mutex that controls grid level metric updates; thanks for the clue about UDP issues. -- ReC On Feb 17, 2010, at 2:58 PM, Scott Dworkis wrote: >> packets together, if not thing else), and that would tend to reduce >> the packet rate slightly. Gmond could probably be a little more > > sure, and would save udp header overhead too. > > -scott > > On Wed, 17 Feb 2010, Jesse Becker wrote: > >> On Wed, Feb 17, 2010 at 16:35, Scott Dworkis <[email protected]> wrote: >>> i think i saw the Ganglia-Gmetric and it appeared fork-based, so i had >>> scale concerns. >>> >>> embeddedgmetric looks good... maybe spent 3 years in alpha though? but if >>> it works, who cares. >> >> Well, there isn't a whole lot to the protocol, so it's possible that >> there really isn't much more work that needs to be done. :) >> >>> one nice thing about the csv approach is it should scale for any language >>> that can do standard io... even a shell or sed script. >> >> Agreed, but neither sed nor awk[1] can send gmetric packets natively; >> Perl at least has modules to help with it. >> >>> i wonder if/how-much a packet is cheaper than a fork? but yeah ideally >> >> I would be shocked sending a UDP packet wasn't substantially cheaper >> than a full blown fork (which then needs to *also* send a UDP packet). >> >>> you'd consolidate packets. as far as i can i haven't bumped into >>> packet-rate issues as much as packet-byte-rate issues, which would remain >>> even in a consolidated packet scenario. >> >> Interesting point. However, consolidating packages does imply a bit >> more pre-processing of the data on the sender's side (to glom the >> packets together, if not thing else), and that would tend to reduce >> the packet rate slightly. Gmond could probably be a little more >> efficient about memory allocation as well if it could allocate a >> larger lump of space at once for multiple metrics instead of dealing >> with each one piecemeal. >> >> >> [1] gawk has some very odd built-in "device files" that you can use. >> Check out the manpage for "Special File names", and then look at the >> /inet/udp/lport/rhost/rport "files". >> >> >> -- >> Jesse Becker >> Every cloud has a silver lining, except for the mushroom-shaped ones, >> which come lined with strontium-90. > <ATT00001..txt><ATT00002..txt> ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

