Right. I have a few VMs aggregators as well as physical hardware. VMs have more issues than physical hardware but are still susceptible to loss. This is very evident with metrics that arrive at the same time e.g. cron triggered gmetric jobs.
Also something unexpected happened. I have two VMs that are a pair ie. all nodes send metrics to both in case one fails we still have metrics. I upgraded e.g. aggregator2. I did not touch aggregator1 yet UDP errors vanished on aggregator1 as well. Puzzling. Vladimir On Mon, 23 Apr 2012, Daniel Pocock wrote: > > > On 23/04/12 22:24, Vladimir Vuksan wrote: >> I was having identical issues. I used your patch with the exception that >> I bumped up buffer size first to 10M from 1M you had. There was a >> massive improvement but still was seeing some drops so I just decided to >> bump it up to 30M and it's even better although I still see occasional >> drops. > > If you have such a big buffer, then you could also have latency issues, > as it suggests your CPU is just not able to process all the work in time > > You would either need to revise the workload (by splitting clusters, > etc) or re-write gmond to be multithreaded (so it can use more cores) > > ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
