Re: [Ganglia-general] Network bytes spikes
On Thu, Mar 31, 2011 at 11:18 AM, Neil Mckee n...@neilandsara.org wrote: (2). The impossible-counter-delta sanity checks in hsflowd depend on whether the field is 32-bit or 64-bit. The upper limit for a 32-bit counter delta is 0x7FFF (about 2e9) and for a 64-bit counter it is 1e13. These checks are applied to the frames and bytes counters, but if either check fails then the sequence number is reset for the whole counter-block -- which invalidates all the counter-deltas for that polling-interval. In other words, if the bytes_in counter jumps crazily then we won't believe the frames, errors or drops counters either. looking at libmetrics/linux/metrics.c, it does seem that compiling with -DREMOVE_BOGUS_SPIKES will do more or less the same as (2). I wonder if it's possible to implement this sanity check as a gmetad-python plugin. This way, the user can enable/disable this feature on-demand without having to re-compile code. Thoughts? Cheers, Bernard -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
On Tue, 29 Mar 2011, Bostjan Skufca wrote: Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). Yes, we see that now and then. Usually OS-correlated, for our Solaris machines network glitches (say replugging the interface to a different port or something similar) will trigger it. Happens less regularly on rhel5-derivatives, and haven't seen it on modern 64-bit Ubuntu. What's the ganglia-appropriate way of setting max limits for certain types of rrds? Currently I have a rather ugly kludge of (after adding a host): find . -name bytes_in.rrd -print0 |xargs -0 -I RRD rrdtool tune RRD --maximum sum:300 find . -name bytes_out.rrd -print0 |xargs -0 -I RRD rrdtool tune RRD --maximum sum:300 It works, but I suspect there could be somewhere in ganglia to specify defaults for types of metrics. /Mattias Wadenstein -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
I checked the sFlow feed, and it looks like the sanity checks for 32-bit rollover and impossible-counter-delta are already present in the hsflowd code (host-sflow.sourceforge.net src/Linux/readNioCounters.c). At least for the Linux and FreeBSD ports anyway. We should add those checks to the Windows port. Always better to clean things up at the source if you can. That makes it less urgent to add the same sanity checks at the receiver end (monitor-core/gmond/sflow.c). Sanity checks in too many places could cause headaches down the line (e.g when we all have 10Tbps links). I apologize if this is too much information about a feature that is only available if you compile the Ganglia trunk from sources, but for the record: (1). The 32-bit rollover problem is handled in hsflowd by polling faster internally (every 3 seconds). This accumulates 64-bit versions of the counters which are then pushed out at the normal polling frequency (typically 20 seconds). If the code detects that the kernel counters are already 64-bit, then it turns off the 3-second polling. (2). The impossible-counter-delta sanity checks in hsflowd depend on whether the field is 32-bit or 64-bit. The upper limit for a 32-bit counter delta is 0x7FFF (about 2e9) and for a 64-bit counter it is 1e13. These checks are applied to the frames and bytes counters, but if either check fails then the sequence number is reset for the whole counter-block -- which invalidates all the counter-deltas for that polling-interval. In other words, if the bytes_in counter jumps crazily then we won't believe the frames, errors or drops counters either. looking at libmetrics/linux/metrics.c, it does seem that compiling with -DREMOVE_BOGUS_SPIKES will do more or less the same as (2). Neil On Mar 30, 2011, at 5:56 PM, Bernard Li wrote: Hi all: On Tue, Mar 29, 2011 at 11:30 AM, Vladimir Vuksan vli...@veus.hr wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. Some old threads which describe the issue in more detail: http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04463.html http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04245.html I see two solutions to this problem: 1) If this is indeed a driver issue, we should check to see if newer kernels can fix that. Perhaps Vladimir could look into this 2) It would probably be a good thing to implement sanity check. I think Neil is looking into implementing this for the sflow integration. Perhaps this could be extended for gmond data as well. To help resolve this issue, I would suggestion that we: 1) File a bug at bugzilla.ganglia.info 2) For all those affected, add comments to the bug providing the network driver model, module used, kernel version, OS version etc. Thanks! Bernard -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
Hi Cameron, there are two problems: a) overflow. 32-bit counters will not last very long on 1 Gbit or faster. They should not repord PB spikes though. b) some BMC adapters on Linux-64 had/have a really bad HW bug reporting bogus counters every now and then. That is supposed to be fixed by REMOVE_BOGUS_SPIKES, but only on Linux. But no guarantees. It worked for me on 3.0.7. Cheers Martin-- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Cameron Spitzer cspit...@nvidia.com To: Bostjan Skufca bost...@a2o.si Cc: ganglia-general ganglia-general@lists.sourceforge.net Sent: Tue, March 29, 2011 11:01:24 PM Subject: Re: [Ganglia-general] Network bytes spikes CPPFLAGS=-DREMOVE_BOGUS_SPIKES had no effect in my installation. We eventually found a patch in a non-ganglia forum somewhere, but I can't find it now. It basically added input sanity checking. The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less than gmond's sampling interval. When it overflows, ganglia treats the small negative number as a very large positive. This is a known ganglia bug. It's been around since 2003. You just have to live with it, or try to fix it yourself. -Cameron Bostjan Skufca wrote: That really seems to be the case. Speaking out of my head now but it seems that I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) interfaces. I've found some threads... Anyway, does this really work? There is something in code which eliminates 1e^13 and bigger or so it seems... make CPPFLAGS=-DREMOVE_BOGUS_SPIKES b. On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote: Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). b. This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
As I said, it's a known bug that will never be fixed by ganglia's maintainers, so if you want it fixed you have to try to do it yourself. The bug is missing input sanity checking. If it were documented someplace, you could call it a feature. The HP DL3x0 and DL5x0 seem to be rather popular. We keep buying more because everything cheaper that we've tried has had performance or management problems. It's not some obscure, low volume product. Martin Knoblauch wrote: Hi Cameron, there are two problems: a) overflow. 32-bit counters will not last very long on 1 Gbit or faster. They should not repord PB spikes though. b) some BMC adapters on Linux-64 had/have a really bad HW bug reporting bogus counters every now and then. That is supposed to be fixed by REMOVE_BOGUS_SPIKES, but only on Linux. But no guarantees. It worked for me on 3.0.7. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de *From:* Cameron Spitzer cspit...@nvidia.com *To:* Bostjan Skufca bost...@a2o.si *Cc:* ganglia-general ganglia-general@lists.sourceforge.net *Sent:* Tue, March 29, 2011 11:01:24 PM *Subject:* Re: [Ganglia-general] Network bytes spikes CPPFLAGS=-DREMOVE_BOGUS_SPIKES had no effect in my installation. We eventually found a patch in a non-ganglia forum somewhere, but I can't find it now. It basically added input sanity checking. The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less than gmond's sampling interval. When it overflows, ganglia treats the small negative number as a very large positive. This is a known ganglia bug. It's been around since 2003. You just have to live with it, or try to fix it yourself. -Cameron Bostjan Skufca wrote: That really seems to be the case. Speaking out of my head now but it seems that I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) interfaces. I've found some threads... Anyway, does this really work? There is something in code which eliminates 1e^13 and bigger or so it seems... make CPPFLAGS=-DREMOVE_BOGUS_SPIKES b. On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr mailto:vli...@veus.hr wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si mailto:bost...@a2o.si wrote: Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). b. This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
On Mar 30, 2011, at 12:42 PM, Cameron L. Spitzer wrote: As I said, it's a known bug that will never be fixed by ganglia's maintainers, so if you want it fixed you have to try to do it yourself. The bug is missing input sanity checking. If you're willing/able to fix this (or you already have a patch?) it would be great to have the fix submitted back to Ganglia as well. alex -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
If I had a patch, I'd post it. I do not understand the internals of ganglia well enough to modify it reliably. Only the people who wrote ganglia can do that. Alex Dean wrote: On Mar 30, 2011, at 12:42 PM, Cameron L. Spitzer wrote: As I said, it's a known bug that will never be fixed by ganglia's maintainers, so if you want it fixed you have to try to do it yourself. The bug is missing input sanity checking. If you're willing/able to fix this (or you already have a patch?) it would be great to have the fix submitted back to Ganglia as well. alex --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --- -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
Hi all: On Tue, Mar 29, 2011 at 11:30 AM, Vladimir Vuksan vli...@veus.hr wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. Some old threads which describe the issue in more detail: http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04463.html http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04245.html I see two solutions to this problem: 1) If this is indeed a driver issue, we should check to see if newer kernels can fix that. Perhaps Vladimir could look into this 2) It would probably be a good thing to implement sanity check. I think Neil is looking into implementing this for the sflow integration. Perhaps this could be extended for gmond data as well. To help resolve this issue, I would suggestion that we: 1) File a bug at bugzilla.ganglia.info 2) For all those affected, add comments to the bug providing the network driver model, module used, kernel version, OS version etc. Thanks! Bernard -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
Perhaps the criticism of Ganglia maintainers is well deserved and this bug will never be fixed. That said it would be more helpful if you could help fix it. Lot of the original developers have moved on and we do need help. Vladimir On Wed, 30 Mar 2011, Cameron L. Spitzer wrote: As I said, it's a known bug that will never be fixed by ganglia's maintainers, so if you want it fixed you have to try to do it yourself. The bug is missing input sanity checking. If it were documented someplace, you could call it a feature. The HP DL3x0 and DL5x0 seem to be rather popular. We keep buying more because everything cheaper that we've tried has had performance or management problems. It's not some obscure, low volume product. Martin Knoblauch wrote: Hi Cameron, there are two problems: a) overflow. 32-bit counters will not last very long on 1 Gbit or faster. They should not repord PB spikes though. b) some BMC adapters on Linux-64 had/have a really bad HW bug reporting bogus counters every now and then. That is supposed to be fixed by REMOVE_BOGUS_SPIKES, but only on Linux. But no guarantees. It worked for me on 3.0.7. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Cameron Spitzer cspit...@nvidia.com To: Bostjan Skufca bost...@a2o.si Cc: ganglia-general ganglia-general@lists.sourceforge.net Sent: Tue, March 29, 2011 11:01:24 PM Subject: Re: [Ganglia-general] Network bytes spikes CPPFLAGS=-DREMOVE_BOGUS_SPIKES had no effect in my installation. We eventually found a patch in a non-ganglia forum somewhere, but I can't find it now. It basically added input sanity checking. The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less than gmond's sampling interval. When it overflows, ganglia treats the small negative number as a very large positive. This is a known ganglia bug. It's been around since 2003. You just have to live with it, or try to fix it yourself. -Cameron Bostjan Skufca wrote: That really seems to be the case. Speaking out of my head now but it seems that I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) interfaces. I've found some threads... Anyway, does this really work? There is something in code which eliminates 1e^13 and bigger or so it seems... make CPPFLAGS=-DREMOVE_BOGUS_SPIKES b. On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote: Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). b. __ This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. __ -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Network bytes spikes
Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). b. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote: Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). b. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
That really seems to be the case. Speaking out of my head now but it seems that I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) interfaces. I've found some threads... Anyway, does this really work? There is something in code which eliminates 1e^13 and bigger or so it seems... make CPPFLAGS=-DREMOVE_BOGUS_SPIKES b. On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote: Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). b. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
CPPFLAGS=-DREMOVE_BOGUS_SPIKES had no effect in my installation. We eventually found a patch in a non-ganglia forum somewhere, but I can't find it now. It basically added input sanity checking. The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less than gmond's sampling interval. When it overflows, ganglia treats the small negative number as a very large positive. This is a known ganglia bug. It's been around since 2003. You just have to live with it, or try to fix it yourself. -Cameron Bostjan Skufca wrote: That really seems to be the case. Speaking out of my head now but it seems that I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) interfaces. I've found some threads... Anyway, does this really work? There is something in code which eliminates 1e^13 and bigger or so it seems... make CPPFLAGS=-DREMOVE_BOGUS_SPIKES b. On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote: Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). b. This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
The code where I found given constant is linux specific, maybe you have something else? b. On 29 March 2011 23:01, Cameron Spitzer cspit...@nvidia.com wrote: CPPFLAGS=-DREMOVE_BOGUS_SPIKES had no effect in my installation. We eventually found a patch in a non-ganglia forum somewhere, but I can't find it now. It basically added input sanity checking. The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less than gmond's sampling interval. When it overflows, ganglia treats the small negative number as a very large positive. This is a known ganglia bug. It's been around since 2003. You just have to live with it, or try to fix it yourself. -Cameron Bostjan Skufca wrote: That really seems to be the case. Speaking out of my head now but it seems that I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) interfaces. I've found some threads... Anyway, does this really work? There is something in code which eliminates 1e^13 and bigger or so it seems... make CPPFLAGS=-DREMOVE_BOGUS_SPIKES b. On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote: Hi, occasionally I notice huge spikes in network graphs in ganglia (petabytes per second or so). Not sure whether those are caused by gmond restarts or network interface byte counter overflows or something else. Is someone else also seeing similar behaviour? Running latest ganglia (3.1.7). b. -- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -- -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Network bytes spikes
On Tue, Mar 29, 2011 at 02:30:23PM -0400, Vladimir Vuksan wrote: I see it all the time :-(. According to Bernard this is due to problem with some of the Broadcom cards. Perhaps Bernard can offer more insight. you also get PB/s values if you failover an IP to a different interface. eg. 10gige to a backup gige. possibly there are other common cases too, maybe bringing up new or old interfaces with zero'd or pre-existing counters. I think some sort of generic 'is this an insane value' limiter in the core code would be the best idea. limiters are easy to apply if you know what the physical limits of the interface are. eg 0 or 1gbit/s on a gige link. not quite so easy for things like pkts/s. we implemented (external) limiters because switch chip resets on our InfiniBand fabric cause the 64bit hardware byte and pkt counters on each port of the chip go back to zero. it's a 40gbit/s fabric (3.2Gbyte/s of data) with fast cpus, so I impose limiters of 0 and 3Gbyte/s and 10Mpkt/s on this data to make sure it is sane before spoof'ing it into ganglia. even though the firmware that was probing the switch chips and causing resets is fixed now, the limiter is still good to have to protect ganglia data from other unforseen problems. it's a pain to have to go in and edit rrd files. cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general