On Tue, Mar 29, 2011 at 02:30:23PM -0400, Vladimir Vuksan wrote:
>I see it all the time :-(. According to Bernard this is due to problem
>with some of the Broadcom cards. Perhaps Bernard can offer more insight.

you also get PB/s values if you failover an IP to a different interface.
eg. 10gige to a backup gige. possibly there are other common cases too,
maybe bringing up new or old interfaces with zero'd or pre-existing
counters.

I think some sort of generic 'is this an insane value' limiter in the
core code would be the best idea.

limiters are easy to apply if you know what the physical limits of the
interface are. eg <0 or > 1gbit/s on a gige link. not quite so easy for
things like pkts/s.

we implemented (external) limiters because switch chip resets on our
InfiniBand fabric cause the 64bit hardware byte and pkt counters on
each port of the chip go back to zero. it's a 40gbit/s fabric
(3.2Gbyte/s of data) with fast cpus, so I impose limiters of >0 and 
< 3Gbyte/s and < 10Mpkt/s on this data to make sure it is sane before
spoof'ing it into ganglia.
even though the firmware that was probing the switch chips and causing
resets is fixed now, the limiter is still good to have to protect
ganglia data from other unforseen problems. it's a pain to have to go
in and edit rrd files.

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to