Just wondering if anyone else has experienced problems with one cluster's
metrics not being reported consistently in a gmetad multi-cluster setup.
At the moment I have a (fairly homogenous) 30-node all-Linux cluster that
reports very strongly (although for some reason cpu_num is reported as 1,
even though all the nodes are dual-proc ... ???? ).
I also have a 18-node all-Solaris cluster. And for some reason, even
though the monitoring core is rock-solid on all these boxes, apparently
gmetad is not polling them consistently. The RRD data shows dropouts
lasting about 1-3 minutes, then the data's back for about as long, then
more dropouts... it shows some or all of the hosts as down (odd...).
The solaris version *does* have a lot of metrics and a lot of them get
updated at the same time ... don't know if that affects anything.
My gmetad_sources has about 15 entries for the Solaris cluster and one
entry for the Linux cluster. Is having that many somehow clouding the
issue? Is gmetad not processing all this info in time on a 360MHz Netra
t1? Will Kyosuke confess his true feelings for Madoka?
I'm hoping someone can shed light on at least two of these questions. I
can't be the only person running gmetad. ;)