On Thu, 21 Nov 2002, matt massie wrote:
> gmond isn't built to have multiple clusters in the xml output but gmetad 
> is specifically designed for that.  
> just put your smaller clusters on unique multicast channels (e.g. compute 
> cluster on "239.2.11.70" and tape robots on "239.2.11.71" etc etc)
> then just use gmetad to pull each of the sources together into a single 
> xml image with multiple cluster groups.

Thanks for that Matt, that's what I suspected.  I've rearranged things 
here so that I can work that way for now, and have been able to collect 
stats for a couple of small groups.

Unfortunately, I'm now hitting the same problem as Sumanth 
Jannyavula-Venk, who posted here on 22nd October - see
  http://sourceforge.net/mailarchive/forum.php?thread_id=1219650&forum_id=7186

I have a cluster currently containing about 230 dual-CPU nodes, and as 
soon as gmetad is pointed at it, the load on the gmetad box rockets.  It 
appears that it's the rrd updates which are causing it, because the CPU is 
still mainly idle.  The particular partition holding the rrdbs is on ext3 
(with data=ordered) in case that matters (kjournald keeps joining gmetad 
at the top of 'top's output).

Also potentially of interest, the following messages keep getting sent to 
syslog:

Nov 28 14:08:41 kick /usr/sbin/gmetad[28197]: RRD_update: illegal attempt 
to update using time 1038492521 when last update time is 1038492521 
(minimum one second step) 
Nov 28 14:08:41 kick /usr/sbin/gmetad[28197]: summary_RRD_update: illegal 
attempt to update using time 1038492521 when last update time is 
1038492521 (minimum one second step) 

Strangely, although the timestamps of the rrdbs are being continuously 
updated, the web frontend is showing no stats except for a single entry or 
two near the beginning of the gmetad run.

A valid XML feed can be obtained from the boxes of the big cluster, 
although it's worth noting that it's nearly 1MB in size!  The gmond boxes 
appear to be carrying on happily.

[Using ganglia 2.5.1 on RH72 boxes, gmetad box is single-CPU 1.4GHz 
Athlon]

I hope I've provided enough information to help pin down the problem!

Regards,
Phil



Reply via email to