Dan Moniz wrote:
<snip>
3) Load on the monitor host/head node seems higher than it should be.
It hovers around 2.6 - 3.0. While other software is running on this
host, shutting down gmetad results in load falling back down to levels
similar to other compute hosts (since the monitor host/head node is
currently also a host in the Compute Hosts cluster). Also, in concert
with the higher than expected load, ssh sessions to the monitor
host/head node seem to take a long time to establish. Again, shutting
down gmetad seems to alleviate these problems. While both of these
issues don't prevent work from being done or gmetad from working (in
the current configuration), it does seem abnormally high and is
something of an annoyance.
Does this happen all the time, or do you happen to have a webbrowser
open all the time on the ganglia page? If so, I might know why.
Over here we noticed that when one or especially multiple people have a
webbrowser open continuously, it generates a bigger load on the web
frontend server. This seemed to happen because the cluster overview page
shows all host graph's by default, and it refreshes automaticly.
Meaning that everytime the overview automaticly refreshes, it redraws
280 host graphs, which can be quite consuming depending on hardware.
If this seems to be your the case, I have a little patch to set the
default cluster overview to not show the host graphs by default. This
decreased the load on our web frontend server. It still stays around 0.9
over here, but that's better than 2.5+
I would also recommend running the gmetad / web frontend on a seperate
machine and not on your head/login node if you can spare the hardware.
You could also use a ramdisk as Matt suggested to store the .rrd's, if
you have enough RAM in the machine. However our cluster (275 machines)
generates about 150 Mb's worth of .rrd files, which is a pretty big
chunk of RAM.
Ramon.