Hi, I am a new Ganglia user. Overall, we are pleased with this tool, but still working through a few startup problems. We are currently monitoring 4500 cpus in 900 hosts, across 17 clusters, including Solaris, AIX, and 5 flavors of Linux. We are using ramdisk on the ganglia server to get reasonable performance (required for >2000 cpus with our server). Ramdisk files are rsync'd to file server every 15 minutes. We plan to investigate RRDCache as an alternative to ramdisk and would appreciate advice on the preferred configuration for best performance on large grids. I.E. ramdisk vs. RRDCache; number of server cpus; tuning suggestions. Some of our new users found the sorting and color coding of clusters and hosts confusing, particularly when a metric other than load_one was selected. The attached legends contain additional explanation about the sorting and color coding conventions, fwiw. RonTitle: Ganglia Cluster Toolkit:: Node Image Legend
Color coding of hosts is based on current one minute (1m) load average utilization. Changing the metric from the pop-down menu does not change the color coding. Changing the metric does change the sort order of the hosts, which are always sorted from highest to lowest for the currently selected metric. Load average utilization depends on the number of CPU cores, as described above. This means that for clusters of hosts with variable numbers of CPU cores, the current load_one value may not decrease monotonically in the sorted list of load_one hosts. For example, (now: 3.93) on a four core host has lower utilization than (now 2.00) on a two core host. The 1m load average is an indicator of the number of processes ready to run (or waiting on I/O on Linux, technically in an un-interruptable sleep state). Load average utilization may be higher than actual CPU load if many processes are waiting on I/O and hence not consuming CPU cycles. | ||||||||||||||||
Sorting order for clusters and snapshots is based on current one minute (1m) load average as listed On Linux, load average utilization may be higher than actual CPU load if many processes are waiting "Running Processes" is an instantaneous value, not averaged, so will tend to change more rapidly | ||||||||||||||||
------------------------------------------------------------------------------
_______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

