Re: [Ganglia-developers] gmond udp receive buffer errors

2012-05-02 Thread Daniel Pocock
On 26/04/12 10:38, Ramon Bastiaans wrote: I just sent in this: * https://github.com/ganglia/monitor-core/pull/34 I changed the patch to behave as you described. See the pull request for details. Hi Ramon, Thanks for contributing this patch, I see it is already checked by Jeff so I've

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-26 Thread Ramon Bastiaans
I just sent in this: * https://github.com/ganglia/monitor-core/pull/34 I changed the patch to behave as you described. See the pull request for details. Cheers, - Ramon. On 24-4-2012 17:47, Daniel Pocock wrote: On 24/04/12 16:51, Ramon Bastiaans wrote: On 23-4-2012 15:26, Daniel Pocock

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-24 Thread Ramon Bastiaans
On 23-4-2012 15:26, Daniel Pocock wrote: Actually, apr can be a little bit more naughty than that: for Vladimir and myself, attempting to query the buffer size from APR reports the value 0. Querying the underlying socket directly reports another value. I'm using apr-1.4.2 on Debian squeeze,

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-24 Thread Daniel Pocock
On 24/04/12 16:51, Ramon Bastiaans wrote: On 23-4-2012 15:26, Daniel Pocock wrote: Actually, apr can be a little bit more naughty than that: for Vladimir and myself, attempting to query the buffer size from APR reports the value 0. Querying the underlying socket directly reports another

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-23 Thread Ramon Bastiaans
This is with gmond version 3.3.1, with a simple udp_receive_channel set like this: udp_recv_channel { port = 8669 } - Ramon. On 23-4-2012 12:03, Ramon Bastiaans wrote: Hi, While troubleshooting an other network issue, I enabled the netstats.py module to report udp_rcvbufrerrors.

[Ganglia-developers] gmond udp receive buffer errors

2012-04-23 Thread Ramon Bastiaans
Hi, While troubleshooting an other network issue, I enabled the netstats.py module to report udp_rcvbufrerrors. Ironically, it seems to me as if gmond itself is experiencing udp receive buffer errors. When I check out /proc/net/udp for drops, amongst other things I see: sl

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-23 Thread Daniel Pocock
Hi Ramon, Vladimir asked about similar errors on IRC recently I thought buffer sizes may be an issue, so the 3.3.7 release candidate has logging of RX buffer sizes (it is logged at debug level when gmond starts). It may be interesting and helpful to compare those buffer sizes, system

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-23 Thread Ramon Bastiaans
Hi Daniel, Ah ok. Before you sent your email I had already created a small patch for myself. It almost seems that APR ignores the OS settings (i.e.: net.core.rmem_default) and creates a socket with it's own default (receive) buffer size. Attached is a patch against 3.3.6 for lib/apr_net.c

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-23 Thread Daniel Pocock
On 23/04/12 22:24, Vladimir Vuksan wrote: I was having identical issues. I used your patch with the exception that I bumped up buffer size first to 10M from 1M you had. There was a massive improvement but still was seeing some drops so I just decided to bump it up to 30M and it's even better

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-23 Thread Vladimir Vuksan
Right. I have a few VMs aggregators as well as physical hardware. VMs have more issues than physical hardware but are still susceptible to loss. This is very evident with metrics that arrive at the same time e.g. cron triggered gmetric jobs. Also something unexpected happened. I have two VMs

Re: [Ganglia-developers] gmond udp receive buffer errors

2012-04-23 Thread Vladimir Vuksan
I was having identical issues. I used your patch with the exception that I bumped up buffer size first to 10M from 1M you had. There was a massive improvement but still was seeing some drops so I just decided to bump it up to 30M and it's even better although I still see occasional drops. To