Re: [Ganglia-general] Total CPU inaccuracies

Ian Cunningham Mon, 23 May 2005 12:49:30 -0700

Matt Klaric,

I am seeing this same problem as well. There seems to be a problem withhow gmetad computes the summaries for each grid. It seems as though itresets it's count of machines each processing loop. When it is asked bythe front end, it seems as though gmetad has not yet finished itscounting, so you get incomplete numbers for the grid summary. The oddthing is that cluster summaries work just fine.

As an aside, I think I have noticed that you are using each machine onyour cluster as a data_source. Normally you would just have one of themachines on the cluster as a data source, as well as backup nodes forredundancy. If you were using multicast, all your nodes would shareinformation on the multicast channel. This is all defined in yourgmond.conf. Using the data in your example, I would suggest that yourgmetad look more like:


data_source "foo" 192.168.7.10 192.168.7.11

This way you do not need to define every node in the cluster in thegmetad config file (as separate clusters)


Ian

Matt Klaric wrote:

I've installed Ganglia v3.0.1 and setup the web interface to gmetad.I've setup this up on a small cluster of 5 machines using the default

configuration for gmond by using the command 'gmond -t'.  I've put this

config file no all the nodes.

Then I setup my gmetad.conf file as follows:
data_source "a" 192.168.7.10
data_source "b" 192.168.7.11
data_source "c" 192.168.7.12
data_source "d" 192.168.7.13
data_source "e" 192.168.7.14
gridname "foo"

When I look at the web interface for Ganglia I notice that the image
showing the number of CPUs in the cluster is not accurate.  It
oscillates up and down over time despite nodes not being added or
removed from the cluster.  It's reporting anywhere from 8 to 14 CPUs in
the cluster when there are really 20 CPUs in the 5 boxes.  (The text to

the left of this image does indicate there are 20 CPUs in 5 hosts.)

Additionally, "Total In-core Memory" shown in the cluster on this
interface is lower than the sum of the amount of RAM in all boxes and

varies over time.

However, if I look at the stats for any one node in the cluster the

values are correct and constant over time.

Has anyone seen these kinds of problems?  How have you addressed them?

Thanks,
Matt


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Total CPU inaccuracies

Reply via email to