Gmond is single threaded.  However Gmetad is not when it produces the XML 
dump.  Would it be possible for you to use the Gmetad port rather than hitting 
Gmond directly?  If you hit the Gmetad interactive port you could request data 
for any of your individual clusters from your script.  Multi-threading Gmond 
output would be a nice thing to have.  I guess it hasn't really been an issue 
in the past because most people only hit Gmond from a single client.

Brad 

>>> Ben Hartshorne <[EMAIL PROTECTED]> 04/25/08 12:35 PM >>> 
Hi,

I have a rather large set of machines I have ganglia watch (~6000), and
am trying to build out a resilient infrastructure.  I ran into an
interesting problem.

I am using gmond version 3.0.2.200511011714 (as reported by --version)

Basic layout - each location (~2000 machines) has a pair of hosts to
which they send their metrics (unicast).  There are a pair of machines
that connect to gmond on each of the edge collectors and centralize the
data (they connect via TCP to port 8649).  We also have another pair of
machines that connect to each edge gmond and grab the current XML dump
for integration with  Nagios (the script is called parse_ganglia for
future reference).

This worked nicely for quite a while, until one of our edge hosts got
too many reportees.  There was a connection timeout in parse_ganglia of
5 seconds, so that when one of the edge hosts was down it would move on
to the other edge hosts quickly rather than waiting 60s for the down
host.  When one of the hosts got too many reportees, it started to take
~6s to transfer all the data.  At this point, one or the other of the
pair of hosts running parse_ganglia started failing on the edge host
that had too many reportees.  

Using tcpdump, I found that though gmond was accepting the connection
from both of them, it would only send data to one at a time, and it
complete sending data to the first before moving on to the second.  so:
* host a connects
* host a starts getting data
* host b connects (3-way handshake complete) but no data flows
* host a finishes sending data
* host b starts getting data
* host b finishes getting data

We solved the immediate problem by increasing the timeout from 5 to
15s., but I was a little surprised that gmond behaved in this
seemingly-single-threaded manner.

While it's easy for us to adjust the timeout in our python
parse_ganglia, it is not so easy to poke at gmetad, and I am worried
about what will happen when we have variations in network quality, more
hosts requesting metrics, etc.  

Is it true that gmond is single threaded in its network operations?  Or
maybe just the listener?  What other effects might this have?  

Would it make sense to change gmond so it passes off dumping the XML
feed to a child thread so that multiple simultaneous connections can be
handled?

Thanks for your time,

-ben

-- 
Ben Hartshorne
email: [EMAIL PROTECTED]
http://ben.hartshorne.net




-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to