Re: [Ganglia-general] Network bytes spikes

2011-03-31 Thread Bernard Li
On Thu, Mar 31, 2011 at 11:18 AM, Neil Mckee n...@neilandsara.org wrote:

 (2). The impossible-counter-delta sanity checks in hsflowd depend on whether 
 the field is 32-bit or 64-bit.   The upper limit for a 32-bit counter delta 
 is 0x7FFF (about 2e9) and for a 64-bit counter it is 1e13.  These checks 
 are applied to the frames and bytes counters,  but if either check fails then 
 the sequence number is reset for the whole counter-block -- which invalidates 
 all the counter-deltas for that polling-interval.  In other words,  if the 
 bytes_in counter jumps crazily then we won't believe the frames, errors or 
 drops counters either.

 looking at libmetrics/linux/metrics.c,  it does seem that compiling with 
 -DREMOVE_BOGUS_SPIKES will do more or less the same as (2).

I wonder if it's possible to implement this sanity check as a
gmetad-python plugin.  This way, the user can enable/disable this
feature on-demand without having to re-compile code.

Thoughts?

Cheers,

Bernard

--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-31 Thread Mattias Wadenstein
On Tue, 29 Mar 2011, Bostjan Skufca wrote:

 Hi,

 occasionally I notice huge spikes in network graphs in ganglia (petabytes
 per second or so). Not sure whether those are caused by gmond restarts or
 network interface byte counter overflows or something else.
 Is someone else also seeing similar behaviour? Running latest ganglia
 (3.1.7).

Yes, we see that now and then. Usually OS-correlated, for our Solaris 
machines network glitches (say replugging the interface to a different 
port or something similar) will trigger it. Happens less regularly on 
rhel5-derivatives, and haven't seen it on modern 64-bit Ubuntu.

What's the ganglia-appropriate way of setting max limits for certain types 
of rrds? Currently I have a rather ugly kludge of (after adding a host):

find . -name bytes_in.rrd -print0 |xargs -0 -I RRD rrdtool tune RRD --maximum 
sum:300
find . -name bytes_out.rrd -print0 |xargs -0 -I RRD rrdtool tune RRD 
--maximum sum:300

It works, but I suspect there could be somewhere in ganglia to specify 
defaults for types of metrics.

/Mattias Wadenstein

--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-31 Thread Neil Mckee
I checked the sFlow feed,  and it looks like the sanity checks for 32-bit 
rollover and impossible-counter-delta are already present in the hsflowd code 
(host-sflow.sourceforge.net  src/Linux/readNioCounters.c).  At least for the 
Linux and FreeBSD ports anyway.  We should add those checks to the Windows 
port.  Always better to clean things up at the source if you can.

That makes it less urgent to add the same sanity checks at the receiver end 
(monitor-core/gmond/sflow.c).   Sanity checks in too many places could cause 
headaches down the line (e.g when we all have 10Tbps links).

I apologize if this is too much information about a feature that is only 
available if you compile the Ganglia trunk from sources,   but for the record:

(1). The 32-bit rollover problem is handled in hsflowd by polling faster 
internally (every 3 seconds).  This accumulates 64-bit versions of the counters 
which are then pushed out at the normal polling frequency (typically 20 
seconds).   If the code detects that the kernel counters are already 64-bit,  
then it turns off the 3-second polling.

(2). The impossible-counter-delta sanity checks in hsflowd depend on whether 
the field is 32-bit or 64-bit.   The upper limit for a 32-bit counter delta is 
0x7FFF (about 2e9) and for a 64-bit counter it is 1e13.  These checks are 
applied to the frames and bytes counters,  but if either check fails then the 
sequence number is reset for the whole counter-block -- which invalidates all 
the counter-deltas for that polling-interval.  In other words,  if the bytes_in 
counter jumps crazily then we won't believe the frames, errors or drops 
counters either.

looking at libmetrics/linux/metrics.c,  it does seem that compiling with 
-DREMOVE_BOGUS_SPIKES will do more or less the same as (2).

Neil




On Mar 30, 2011, at 5:56 PM, Bernard Li wrote:

 Hi all:
 
 On Tue, Mar 29, 2011 at 11:30 AM, Vladimir Vuksan vli...@veus.hr wrote:
 
 I see it all the time :-(. According to Bernard this is due to problem
 with some of the Broadcom cards. Perhaps Bernard can offer more insight.
 
 Some old threads which describe the issue in more detail:
 
 http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04463.html
 http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04245.html
 
 I see two solutions to this problem:
 
 1) If this is indeed a driver issue, we should check to see if newer
 kernels can fix that.  Perhaps Vladimir could look into this
 
 2) It would probably be a good thing to implement sanity check.  I
 think Neil is looking into implementing this for the sflow
 integration.  Perhaps this could be extended for gmond data as well.
 
 To help resolve this issue, I would suggestion that we:
 
 1) File a bug at bugzilla.ganglia.info
 2) For all those affected, add comments to the bug providing the
 network driver model, module used, kernel version, OS version etc.
 
 Thanks!
 
 Bernard
 
 --
 Create and publish websites with WebMatrix
 Use the most popular FREE web apps or write code yourself; 
 WebMatrix provides all the features you need to develop and 
 publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-30 Thread Martin Knoblauch
Hi Cameron,

 there are two problems:

a) overflow. 32-bit counters will not last very long on 1 Gbit or faster. They 
should not repord PB spikes though.
b) some BMC adapters on Linux-64 had/have a really bad HW bug reporting bogus 
counters every now and then. That is supposed to be fixed by 
REMOVE_BOGUS_SPIKES, but only on Linux. But no guarantees. It worked for me on 
3.0.7.

 Cheers

 Martin--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de



From: Cameron Spitzer cspit...@nvidia.com
To: Bostjan Skufca bost...@a2o.si
Cc: ganglia-general ganglia-general@lists.sourceforge.net
Sent: Tue, March 29, 2011 11:01:24 PM
Subject: Re: [Ganglia-general] Network bytes spikes


CPPFLAGS=-DREMOVE_BOGUS_SPIKES
had no effect in my installation.
We eventually found a patch in a non-ganglia forum somewhere, but I can't find 
it now.
It basically added input sanity checking.

The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less than 
gmond's sampling interval.
When it overflows, ganglia treats the small negative number as a very large 
positive.
This is a known ganglia bug.  It's been around since 2003.  You just have to 
live with it, or try to fix it yourself.

-Cameron



Bostjan Skufca wrote: 
That really seems to be the case. Speaking out of my head now but it seems 
that 
I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 
Gigabit Ethernet (rev 12) interfaces. I've found some threads...

Anyway, does this really work? There is something in code which eliminates 
1e^13 
and bigger or so it seems...

make CPPFLAGS=-DREMOVE_BOGUS_SPIKES

b.



On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote:


I see it all the time :-(. According to Bernard this is due to problem
with some of the Broadcom cards. Perhaps Bernard can offer more insight.


On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote:
 Hi,

 occasionally I notice huge spikes in network graphs in ganglia
(petabytes
 per second or so). Not sure whether those are caused by gmond restarts
or
 network interface byte counter overflows or something else.
 Is someone else also seeing similar behaviour? Running latest ganglia
 (3.1.7).

 b.





 
This email message is for the sole use of the intended recipient(s) and may  
contain confidential information.  Any unauthorized review, use, disclosure  or 
distribution is prohibited.  If you are not the intended recipient,  please 
contact the sender by reply email and destroy all copies of the original  
message. 


--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-30 Thread Cameron L. Spitzer

As I said, it's a known bug that will never be fixed by ganglia's
maintainers, so if you want it fixed you have to try to do it yourself. 
The bug is missing input sanity checking.
If it were documented someplace, you could call it a feature.

The HP DL3x0 and DL5x0 seem to be rather popular.  We keep buying more
because everything cheaper that we've tried has had performance or
management problems.  It's not some obscure, low volume product.


Martin Knoblauch wrote:
 Hi Cameron,

  there are two problems:

 a) overflow. 32-bit counters will not last very long on 1 Gbit or
 faster. They should not repord PB spikes though.
 b) some BMC adapters on Linux-64 had/have a really bad HW bug
 reporting bogus counters every now and then. That is supposed to be
 fixed by REMOVE_BOGUS_SPIKES, but only on Linux. But no guarantees. It
 worked for me on 3.0.7.

  Cheers
  Martin
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www: http://www.knobisoft.de


 *From:* Cameron Spitzer cspit...@nvidia.com
 *To:* Bostjan Skufca bost...@a2o.si
 *Cc:* ganglia-general ganglia-general@lists.sourceforge.net
 *Sent:* Tue, March 29, 2011 11:01:24 PM
 *Subject:* Re: [Ganglia-general] Network bytes spikes


 CPPFLAGS=-DREMOVE_BOGUS_SPIKES
 had no effect in my installation.
 We eventually found a patch in a non-ganglia forum somewhere, but
 I can't find it now.
 It basically added input sanity checking.

 The problem is a 32-bit counter on a 1 Gbps NIC can overflow in
 less than gmond's sampling interval.
 When it overflows, ganglia treats the small negative number as a
 very large positive.
 This is a known ganglia bug.  It's been around since 2003.  You
 just have to live with it, or try to fix it yourself.

 -Cameron



 Bostjan Skufca wrote:
 That really seems to be the case. Speaking out of my head now but
 it seems that I only see this on HP DL3x0 with Broadcom
 Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
 interfaces. I've found some threads...

 Anyway, does this really work? There is something in code which
 eliminates 1e^13 and bigger or so it seems...
 make CPPFLAGS=-DREMOVE_BOGUS_SPIKES
   

 b.


 On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr
 mailto:vli...@veus.hr wrote:


 I see it all the time :-(. According to Bernard this is due
 to problem
 with some of the Broadcom cards. Perhaps Bernard can offer
 more insight.

 On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca
 bost...@a2o.si mailto:bost...@a2o.si wrote:
  Hi,
 
  occasionally I notice huge spikes in network graphs in ganglia
 (petabytes
  per second or so). Not sure whether those are caused by
 gmond restarts
 or
  network interface byte counter overflows or something else.
  Is someone else also seeing similar behaviour? Running
 latest ganglia
  (3.1.7).
 
  b.



 
 This email message is for the sole use of the intended
 recipient(s) and may contain confidential information.  Any
 unauthorized review, use, disclosure or distribution is
 prohibited.  If you are not the intended recipient, please contact
 the sender by reply email and destroy all copies of the original
 message.
 


--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-30 Thread Alex Dean

On Mar 30, 2011, at 12:42 PM, Cameron L. Spitzer wrote:

 
 As I said, it's a known bug that will never be fixed by ganglia's 
 maintainers, so if you want it fixed you have to try to do it yourself.  The 
 bug is missing input sanity checking.

If you're willing/able to fix this (or you already have a patch?) it would be 
great to have the fix submitted back to Ganglia as well.

alex
--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-30 Thread Cameron L. Spitzer

If I had a patch, I'd post it.
I do not understand the internals of ganglia well enough to modify it
reliably.  Only the people who wrote ganglia can do that.


Alex Dean wrote:
 On Mar 30, 2011, at 12:42 PM, Cameron L. Spitzer wrote:

   
 As I said, it's a known bug that will never be fixed by ganglia's 
 maintainers, so if you want it fixed you have to try to do it yourself.  The 
 bug is missing input sanity checking.
 
 If you're willing/able to fix this (or you already have a patch?) it would be 
 great to have the fix submitted back to Ganglia as well.

 alex


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---

--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-30 Thread Bernard Li
Hi all:

On Tue, Mar 29, 2011 at 11:30 AM, Vladimir Vuksan vli...@veus.hr wrote:

 I see it all the time :-(. According to Bernard this is due to problem
 with some of the Broadcom cards. Perhaps Bernard can offer more insight.

Some old threads which describe the issue in more detail:

http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04463.html
http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04245.html

I see two solutions to this problem:

1) If this is indeed a driver issue, we should check to see if newer
kernels can fix that.  Perhaps Vladimir could look into this

2) It would probably be a good thing to implement sanity check.  I
think Neil is looking into implementing this for the sflow
integration.  Perhaps this could be extended for gmond data as well.

To help resolve this issue, I would suggestion that we:

1) File a bug at bugzilla.ganglia.info
2) For all those affected, add comments to the bug providing the
network driver model, module used, kernel version, OS version etc.

Thanks!

Bernard

--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-30 Thread Vladimir Vuksan
Perhaps the criticism of Ganglia maintainers is well deserved and this 
bug will never be fixed. That said it would be more helpful if you could 
help fix it. Lot of the original developers have moved on and we do need 
help.


Vladimir

On Wed, 30 Mar 2011, Cameron L. Spitzer wrote:



As I said, it's a known bug that will never be fixed by ganglia's maintainers, 
so if you
want it fixed you have to try to do it yourself.  The bug is missing input 
sanity
checking.
If it were documented someplace, you could call it a feature.

The HP DL3x0 and DL5x0 seem to be rather popular.  We keep buying more because 
everything
cheaper that we've tried has had performance or management problems.  It's not 
some
obscure, low volume product.


Martin Knoblauch wrote:
  Hi Cameron,

   there are two problems:

  a) overflow. 32-bit counters will not last very long on 1 Gbit or faster. 
They
  should not repord PB spikes though.
  b) some BMC adapters on Linux-64 had/have a really bad HW bug reporting 
bogus
  counters every now and then. That is supposed to be fixed by
  REMOVE_BOGUS_SPIKES, but only on Linux. But no guarantees. It worked for 
me on
  3.0.7.

   Cheers
   Martin
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

  From: Cameron Spitzer cspit...@nvidia.com
  To: Bostjan Skufca bost...@a2o.si
  Cc: ganglia-general ganglia-general@lists.sourceforge.net
  Sent: Tue, March 29, 2011 11:01:24 PM
  Subject: Re: [Ganglia-general] Network bytes spikes


  CPPFLAGS=-DREMOVE_BOGUS_SPIKES
  had no effect in my installation.
  We eventually found a patch in a non-ganglia forum somewhere, but I
  can't find it now.
  It basically added input sanity checking.

  The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less
  than gmond's sampling interval.
  When it overflows, ganglia treats the small negative number as a very
  large positive.
  This is a known ganglia bug.  It's been around since 2003.  You just
  have to live with it, or try to fix it yourself.

  -Cameron



  Bostjan Skufca wrote:
That really seems to be the case. Speaking out of my head
now but it seems that I only see this on HP DL3x0 with
Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet
(rev 12) interfaces. I've found some threads...

Anyway, does this really work? There is something in code
which eliminates 1e^13 and bigger or so it seems...

make CPPFLAGS=-DREMOVE_BOGUS_SPIKES


b.


On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr
wrote:

  I see it all the time :-(. According to Bernard
  this is due to problem
  with some of the Broadcom cards. Perhaps Bernard
  can offer more insight.

  On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan
  Skufca bost...@a2o.si wrote:
   Hi,
  
   occasionally I notice huge spikes in network
  graphs in ganglia
  (petabytes
   per second or so). Not sure whether those are
  caused by gmond restarts
  or
   network interface byte counter overflows or
  something else.
   Is someone else also seeing similar behaviour?
  Running latest ganglia
   (3.1.7).
  
   b.



__
This email message is for the sole use of the intended recipient(s) and may
contain confidential information.  Any unauthorized review, use, disclosure or
distribution is prohibited.  If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original
message.

__



--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Network bytes spikes

2011-03-29 Thread Bostjan Skufca
Hi,

occasionally I notice huge spikes in network graphs in ganglia (petabytes
per second or so). Not sure whether those are caused by gmond restarts or
network interface byte counter overflows or something else.
Is someone else also seeing similar behaviour? Running latest ganglia
(3.1.7).

b.
--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-29 Thread Vladimir Vuksan

I see it all the time :-(. According to Bernard this is due to problem
with some of the Broadcom cards. Perhaps Bernard can offer more insight.

On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote:
 Hi,
 
 occasionally I notice huge spikes in network graphs in ganglia
(petabytes
 per second or so). Not sure whether those are caused by gmond restarts
or
 network interface byte counter overflows or something else.
 Is someone else also seeing similar behaviour? Running latest ganglia
 (3.1.7).
 
 b.

--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-29 Thread Bostjan Skufca
That really seems to be the case. Speaking out of my head now but it seems
that I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II
BCM5708 Gigabit Ethernet (rev 12) interfaces. I've found some threads...

Anyway, does this really work? There is something in code which eliminates
1e^13 and bigger or so it seems...

make CPPFLAGS=-DREMOVE_BOGUS_SPIKES


b.


On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote:


 I see it all the time :-(. According to Bernard this is due to problem
 with some of the Broadcom cards. Perhaps Bernard can offer more insight.

 On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote:
  Hi,
 
  occasionally I notice huge spikes in network graphs in ganglia
 (petabytes
  per second or so). Not sure whether those are caused by gmond restarts
 or
  network interface byte counter overflows or something else.
  Is someone else also seeing similar behaviour? Running latest ganglia
  (3.1.7).
 
  b.

--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-29 Thread Cameron Spitzer





CPPFLAGS=-DREMOVE_BOGUS_SPIKES
had no effect in my installation.
We eventually found a patch in a non-ganglia forum somewhere, but I
can't find it now.
It basically added input sanity checking.

The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less
than gmond's sampling interval.
When it overflows, ganglia treats the small negative number as a very
large positive.
This is a known ganglia bug. It's been around since 2003. You just
have to live with it, or try to fix it yourself.

-Cameron



Bostjan Skufca wrote:
That really seems to be the case. Speaking out of my head
now but it seems that I only see this on HP DL3x0 with Broadcom
Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) interfaces.
I've found some threads...
  
Anyway, does this really work? There is something in code which
eliminates 1e^13 and bigger or so it seems...
  make CPPFLAGS=-DREMOVE_BOGUS_SPIKES
  
  
b.
  
  
  On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr
wrote:
  
I see it all the time :-(. According to Bernard this is due to problem
with some of the Broadcom cards. Perhaps Bernard can offer more insight.


On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si
wrote:
 Hi,

 occasionally I notice huge spikes in network graphs in ganglia
(petabytes
 per second or so). Not sure whether those are caused by gmond
restarts
or
 network interface byte counter overflows or something else.
 Is someone else also seeing similar behaviour? Running latest
ganglia
 (3.1.7).

 b.


  
  
  






This email message is for the sole use of the intended recipient(s) and may 
contain confidential information. Any unauthorized review, use, disclosure 
or distribution is prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message. 







--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-29 Thread Bostjan Skufca
The code where I found given constant is linux specific, maybe you have
something else?

b.


On 29 March 2011 23:01, Cameron Spitzer cspit...@nvidia.com wrote:


 CPPFLAGS=-DREMOVE_BOGUS_SPIKES
 had no effect in my installation.
 We eventually found a patch in a non-ganglia forum somewhere, but I can't
 find it now.
 It basically added input sanity checking.

 The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less than
 gmond's sampling interval.
 When it overflows, ganglia treats the small negative number as a very large
 positive.
 This is a known ganglia bug.  It's been around since 2003.  You just have
 to live with it, or try to fix it yourself.

 -Cameron




 Bostjan Skufca wrote:

 That really seems to be the case. Speaking out of my head now but it seems
 that I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II
 BCM5708 Gigabit Ethernet (rev 12) interfaces. I've found some threads...

 Anyway, does this really work? There is something in code which eliminates
 1e^13 and bigger or so it seems...

 make CPPFLAGS=-DREMOVE_BOGUS_SPIKES



 b.


 On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote:


 I see it all the time :-(. According to Bernard this is due to problem
 with some of the Broadcom cards. Perhaps Bernard can offer more insight.

 On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si
 wrote:
  Hi,
 
  occasionally I notice huge spikes in network graphs in ganglia
 (petabytes
  per second or so). Not sure whether those are caused by gmond restarts
 or
  network interface byte counter overflows or something else.
  Is someone else also seeing similar behaviour? Running latest ganglia
  (3.1.7).
 
  b.



  --
  This email message is for the sole use of the intended recipient(s) and
 may contain confidential information.  Any unauthorized review, use,
 disclosure or distribution is prohibited.  If you are not the intended
 recipient, please contact the sender by reply email and destroy all copies
 of the original message.
  --


--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Network bytes spikes

2011-03-29 Thread Robin Humble
On Tue, Mar 29, 2011 at 02:30:23PM -0400, Vladimir Vuksan wrote:
I see it all the time :-(. According to Bernard this is due to problem
with some of the Broadcom cards. Perhaps Bernard can offer more insight.

you also get PB/s values if you failover an IP to a different interface.
eg. 10gige to a backup gige. possibly there are other common cases too,
maybe bringing up new or old interfaces with zero'd or pre-existing
counters.

I think some sort of generic 'is this an insane value' limiter in the
core code would be the best idea.

limiters are easy to apply if you know what the physical limits of the
interface are. eg 0 or  1gbit/s on a gige link. not quite so easy for
things like pkts/s.

we implemented (external) limiters because switch chip resets on our
InfiniBand fabric cause the 64bit hardware byte and pkt counters on
each port of the chip go back to zero. it's a 40gbit/s fabric
(3.2Gbyte/s of data) with fast cpus, so I impose limiters of 0 and 
 3Gbyte/s and  10Mpkt/s on this data to make sure it is sane before
spoof'ing it into ganglia.
even though the firmware that was probing the switch chips and causing
resets is fixed now, the limiter is still good to have to protect
ganglia data from other unforseen problems. it's a pain to have to go
in and edit rrd files.

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility

--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general