Hi,
 
I am a new Ganglia user.  Overall, we are pleased with this tool, but
still working through a few startup problems.  We are currently
monitoring 4500 cpus in 900 hosts, across 17 clusters, including
Solaris, AIX, and 5 flavors of Linux.
 
We are using ramdisk on the ganglia server to get reasonable performance
(required for >2000 cpus with our server).  Ramdisk files are rsync'd to
file server every 15 minutes.  We plan to investigate RRDCache as an
alternative to ramdisk and would appreciate advice on the preferred
configuration for best performance on large grids.  I.E. ramdisk vs.
RRDCache; number of server cpus; tuning suggestions.
 
Some of our new users found the sorting and color coding of clusters and
hosts confusing, particularly when a metric other than load_one was
selected.  The attached legends contain additional explanation about the
sorting and color coding conventions, fwiw.
 
Ron
 
 
Title: Ganglia Cluster Toolkit:: Node Image Legend
Ganglia Node Image Legend
Node Image Meaning
Red Over 100% Utilization. Utilization is: (1 min load) / (number of CPUs) * 100%.
Orange 75-100%
Yellow 50-74%
Green 25-49%
Blue 0-24%
Crossbones The node is dead. We consider a node dead when the reporting node has not heard from it in 60 sec.

Color coding of hosts is based on current one minute (1m) load average utilization.  Changing the metric from the
pop-down menu does not change the color coding.   Changing the metric does change the sort order of the
hosts, which are always sorted from highest to lowest for the currently selected metric.

Load average utilization depends on the number of CPU cores, as described above.  This means that for clusters
of hosts with variable numbers of CPU cores, the current load_one value may not decrease monotonically in the
sorted list of load_one hosts.  For example,  (now: 3.93) on a four core host has lower utilization than (now 2.00)
on a two core host.

The 1m load average is an indicator of the number of processes ready to run (or waiting on I/O on Linux,
technically in an un-interruptable sleep state).  Load average utilization may be higher than actual CPU load
if many processes are waiting on I/O and hence not consuming CPU cycles.

Back

Title: Ganglia Cluster Toolkit:: Cluster Image Legend
Ganglia Cluster Image Legend
Cluster Image Meaning
Red Over 100% Utilization. Utilization is: (1 min load) / (number of CPUs in cluster) * 100%.
Orange 75-100%
Yellow 50-74%
Green 25-49%
Blue 0-24%
Grey A private cluster.

Sorting order for clusters and snapshots is based on current one minute (1m) load average as listed
near the left hand edge of the page.  Color coding of cluster snapshots is also based on current 1m
load average.  The 1m load average is based on the number of processes running or ready to run
(or in uninterruptible sleep on Linux - normally waiting on TCP or disk I/O).

On Linux, load average utilization may be higher than actual CPU load if many processes are waiting 
on I/O and hence not consuming CPU cycles.

"Running Processes" is an instantaneous value, not averaged, so will tend to change more rapidly
than 1m load average.  The count of running processes is the same as the value reported by "top"
on Linux or in /proc/loadavg.

Back

------------------------------------------------------------------------------
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to