My experience so far:

RRD files on ramdisk is a good idea. RRD is very basic with its I/O, it
writes as soon
as it gets a data point (and reads as well). In my case, a simple blade
engnieering server
with simple local disk was really being hammered with 100 nodes, except
at a 5 second poll,
with 5 second RRD files, and RRDs for max as well as average
consolidation. In our pilot
we are moving to SAN.

BTW, do the sums on your ram disk space. 1000 nodes is a lot of rrd
files!

I/O and network load will reduce in direct proportion to the metric
refresh/timeout/threshold rates
and to the gmetad polling rate. So reducing the poll to 60 seconds will
have a direct benefit.
But if you reduce to 60 seconds, remember to change the RRD file
definitions appropriately as well.
This may be telling you to suck eggs, but a gmetad poll will only ever
give you the most recent
cluster state. If the cluster updates faster than you poll, the extra
data is just lost.

direct connection using gmetad and tcp connects to each and every node
(all 1000) is bound
to be a bad, nay impossible idea. Ask the other experts, but I cant see
it working at all.
There "may" be some use in a middle path. e.g. groups or 200 nodes
unicasting to a designated head
node and then configuring gmetad to go to (in this case) 5 head nodes.

Another idea I have not yet explored would be to unicast all your
metrics back to the gmond running
on your gmetric ganglia server. The advantage of this is that the TCP
conect to get the cluster
state would occur on the loopback interface of the local machine. Faster
than an actual network
transfer of the XML.

phew. My 2 cents worth.

kind regards,
richard

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joel
Krauska
Sent: 27 January 2006 09:41
To: [email protected]
Subject: [Ganglia-general] Pointers on architecting a large scale
ganglia setup??


I've seen similar scaling questions asked, but not a lot of answers.

I hope this query falls on some ears with experience in the brains 
behind them.

I'm looking to deploy ganglia on a largish cluster.
(1000+ nodes)

Here were some of my thoughts on how I could help scale the system.

Any opinions or suggestions are greatly appreciated.

- Put gmetad rrd files on a ramdisk.
This should decrease the frequency of disk writes
during normal runs.
If I rsync to a local disk every hour or so, I can get away with limited
disk writes, and still have a reasonable backup of data.


- Use TCP polling queries instead of UDP or Multicast push. (disable
UDP/multicast pushing) I'd prefer to let gmetad poll instead of having
1000 UDP messages flying 
around on odd intervals.  A good practice?


- Alter timers for lighter network load?
examples? ideas?
Was going to just go to 30 or 60s timers in gmetad.conf cluster 
definition to start.


- Consider "federating"?
Create groups of 100 gmond hosts managed by single gmetas, all linking
up to a core gmetad.


Thanks much,

Joel Krauska


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files for problems?  Stop!  Download the new AJAX search engine that
makes searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Ganglia-general mailing list [email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general


------------------------------------------------------------------------
For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.

------------------------------------------------------------------------


Reply via email to