I've got the following collectd arrangement:

    Solaris Zone 1 collectd --.
    Solaris Zone 2 collectd --+--  Linux collectd -> rrd
    Solaris Zone 3 collectd --'
    Solaris zone 4 collectd --'

So four Solaris zones, which all exist on the same host server, reporting (via network plugin) to collectd running on Linux. It actually works very well.

The binaries and configurations for all four zones are identical, except for Hostname. Most of the stats are working fine, *except* for "fork_rate" from the processes plugin.

This is where it gets weird.

"fork_rate", because these are zones and not full VMs, is the exact same metric across all four. So it's wasteful for me to be recording it four times, but not terribly so - and it helps avoid needing to flip pages when viewing the stats.

However, two of the zones are reporting "NaN" for that metric, while the other two are happily recording real, useful values. Keep in mind that this is effectively the same number being sent by all four zones... I don't think it'd vary that much as each zone's collectd gets CPU time, and not this consistently.

What are my best means of finding out *why* RRD would reject a value? I've checked to make sure the "heartbeat" of each rrd matches the interval... and I've tried turning up syslogging but there's a lot of traffic and it's hard to pick things out when I don't know what I'm looking for.

Is there a means of detecting rrd rejections?

_______________________________________________
collectd mailing list
[email protected]
http://mailman.verplant.org/listinfo/collectd

Reply via email to