Just wondering if anyone else has experienced problems with one cluster's metrics not being reported consistently in a gmetad multi-cluster setup.

At the moment I have a (fairly homogenous) 30-node all-Linux cluster that reports very strongly (although for some reason cpu_num is reported as 1, even though all the nodes are dual-proc ... ???? ).

I also have a 18-node all-Solaris cluster. And for some reason, even though the monitoring core is rock-solid on all these boxes, apparently gmetad is not polling them consistently. The RRD data shows dropouts lasting about 1-3 minutes, then the data's back for about as long, then more dropouts... it shows some or all of the hosts as down (odd...).

The solaris version *does* have a lot of metrics and a lot of them get updated at the same time ... don't know if that affects anything.

My gmetad_sources has about 15 entries for the Solaris cluster and one entry for the Linux cluster. Is having that many somehow clouding the issue? Is gmetad not processing all this info in time on a 360MHz Netra t1? Will Kyosuke confess his true feelings for Madoka?

I'm hoping someone can shed light on at least two of these questions. I can't be the only person running gmetad. ;)


Reply via email to