Ian, You're right. My terminology was quite mixed up. I was jumping back into ganglia after a long time away and based on explanations of the setup from another admin that were slightly inaccurate.
I've solved the problem - thanks to you. What you said made me realize what was wrong with our configs. Let me try a fresh new explanation... this one may be useful for the archives. - We have rougly 10 clusters in a grid. - Within each cluster, all of the systems unicast their data to two specific systems within the cluster. - gmetad pulls from each one of these clusters using a data_source and both 'collector hosts' as arguments to the data_source. There are two collector hosts for redundancy. - In addition, all hosts also unicasted themselves to another 'cluster' which consisted of 'all hosts' - this gmond instance ran on the same box as the gmetad. There were two problems: 1. The 'all hosts' cluster had the same name as the 'grid' - this seems to cause _great_ confusion. Renaming this cluster to something else caused the grid view to immediately start working again. This became obvious to me once I understood how ganglia was working. 2. The 'all hosts' cluster gets completely overrun. gmond doesn't keep up very well, and the data has gaps and is often significantly behind. The solution to the first one was easy - remove the naming conflict. As for the second problem I reconfigured things to only send themselves to the gmond running on the ganglia server if they were not part of an existing cluster. I then named this cluster 'other hosts', and told gmetad to pull 'other-hosts' from the localhost gmond. That fixes the problem I reported. Sadly, I have one other, very small problem - but I'll send another email for that. Thanks for your help. Sorry for the very poor initial description. -- Phil Dibowitz P: 310-360-2330 C: 213-923-5115 Unix Admin, Ticketmaster.com "Never write it in C if you can do it in 'awk'; Never do it in 'awk' if 'sed' can handle it; Never use 'sed' when 'tr' can do the job; Never invoke 'tr' when 'cat' is sufficient; Avoid using 'cat' whenever possible" -- Taylor's Laws of Programming
signature.asc
Description: OpenPGP digital signature

