[Ganglia-general] GMond Server + EC2 using UDP Buffer Overflow
Hi Ganglia Experts, I need help with GMond server specifically. I'm into this issue where my GMond Server is loosing UDP packets, which all of my GMond clients and custom apps are sending. My Infra: Infra : AWS EC2 Machine Type: M1x.large No. Of Machines in the cluster: 500 # of Custom metrics : 4 million / day OS: Ubuntu Change rmax, wmax and somaconn params of the GMond Server machine, but still facing the same issue. When I did netstat -su , can see the RbuffError # is increasing each time. So, if anyone has configured Ganglia at this in AWS then please help. Options, I'm thinking are: 1. Having multiple GMond Servers ( This has impact that, at the GMetad understand multiple GMond source but as different Clusters and cant unified as a single for GMetad and UI. ) 2. Trying more linux level configs to ovecome. Regards, Manish -- Start Your Social Network Today - Download eXo Platform Build your Enterprise Intranet with eXo Platform Software Java Based Open Source Intranet - Social, Extensible, Cloud Ready Get Started Now And Turn Your Intranet Into A Collaboration Platform http://p.sf.net/sfu/ExoPlatform___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] GMond Server + EC2 using UDP Buffer Overflow
Thanks Vladimir for the quick reply !! Its 3.4.x , but I'm wondering this problem is more the system / machine level instead of GMond server level. As I see the Recieved Buffer Errors at the system level, so even before reaching the GMond server process OS is having issue in reading those UDP packets. I might be missing, but please see if it make sense. Plus I also need to scale my app which is pushing 4 million/day metrics as I see the Send buffer error on that machine. Regards, Manish On Sun, Apr 27, 2014 at 11:50 AM, Vladimir Vuksan vli...@veus.hr wrote: What version of gmond are you running? 3.5.0+ have the ability to set higher UDP Buffer sizes. On 27. travnja 2014. 14:27:21 EDT, Manish Malhotra manish.hadoop.w...@gmail.com wrote: Hi Ganglia Experts, I need help with GMond server specifically. I'm into this issue where my GMond Server is loosing UDP packets, which all of my GMond clients and custom apps are sending. My Infra: Infra : AWS EC2 Machine Type: M1x.large No. Of Machines in the cluster: 500 # of Custom metrics : 4 million / day OS: Ubuntu Change rmax, wmax and somaconn params of the GMond Server machine, but still facing the same issue. When I did netstat -su , can see the RbuffError # is increasing each time. So, if anyone has configured Ganglia at this in AWS then please help. Options, I'm thinking are: 1. Having multiple GMond Servers ( This has impact that, at the GMetad understand multiple GMond source but as different Clusters and cant unified as a single for GMetad and UI. ) 2. Trying more linux level configs to ovecome. Regards, Manish -- Start Your Social Network Today - Download eXo Platform Build your Enterprise Intranet with eXo Platform Software Java Based Open Source Intranet - Social, Extensible, Cloud Ready Get Started Now And Turn Your Intranet Into A Collaboration Platform http://p.sf.net/sfu/ExoPlatform -- Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general Vladimir -- Start Your Social Network Today - Download eXo Platform Build your Enterprise Intranet with eXo Platform Software Java Based Open Source Intranet - Social, Extensible, Cloud Ready Get Started Now And Turn Your Intranet Into A Collaboration Platform http://p.sf.net/sfu/ExoPlatform___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] GMond Server + EC2 using UDP Buffer Overflow
Thanks for the reply ! I already set net.core.rmem_max = 1 ( 100 mb) Other one that you mentioned : then in my UDP receive channels I do buffer = 5000 I believe this setting is not available in 3.4.x version and would be available from 3.5.x version? Or anything can be done with 3.4.x version as well. Other question is :If I update the GMond code for GMond server, do I need to upgrade GMond client/GMetad and UI components as well? Regards, Manish On Sun, Apr 27, 2014 at 5:43 PM, Vladimir Vuksan vli...@veus.hr wrote: IIRC by default gmond will set buffer size to 128kB which is insufficient if you are collecting lots of metrics. These two commits change the behavior so that you can set UDP receive buffer to higher as long as eg. net.core.rmem_max is set https://github.com/ganglia/monitor-core/commit/5f5d5cad408db9f688ffef5f29524eae0871a5a7 https://github.com/ganglia/monitor-core/commit/fa8b1bc2e334dc7257f9e7562ddbd6936ca200c2 What I do is set this in sysctl.conf net.core.rmem_max = 5100 then in my UDP receive channels I do buffer = 5000 Otherwise as you observed you get UDP recv buffer errors. Vladimir On 04/27/2014 06:03 PM, Manish Malhotra wrote: Thanks Vladimir for the quick reply !! Its 3.4.x , but I'm wondering this problem is more the system / machine level instead of GMond server level. As I see the Recieved Buffer Errors at the system level, so even before reaching the GMond server process OS is having issue in reading those UDP packets. I might be missing, but please see if it make sense. Plus I also need to scale my app which is pushing 4 million/day metrics as I see the Send buffer error on that machine. Regards, Manish On Sun, Apr 27, 2014 at 11:50 AM, Vladimir Vuksan vli...@veus.hr wrote: What version of gmond are you running? 3.5.0+ have the ability to set higher UDP Buffer sizes. On 27. travnja 2014. 14:27:21 EDT, Manish Malhotra manish.hadoop.w...@gmail.com wrote: Hi Ganglia Experts, I need help with GMond server specifically. I'm into this issue where my GMond Server is loosing UDP packets, which all of my GMond clients and custom apps are sending. My Infra: Infra : AWS EC2 Machine Type: M1x.large No. Of Machines in the cluster: 500 # of Custom metrics : 4 million / day OS: Ubuntu Change rmax, wmax and somaconn params of the GMond Server machine, but still facing the same issue. When I did netstat -su , can see the RbuffError # is increasing each time. So, if anyone has configured Ganglia at this in AWS then please help. Options, I'm thinking are: 1. Having multiple GMond Servers ( This has impact that, at the GMetad understand multiple GMond source but as different Clusters and cant unified as a single for GMetad and UI. ) 2. Trying more linux level configs to ovecome. Regards, Manish -- Start Your Social Network Today - Download eXo Platform Build your Enterprise Intranet with eXo Platform Software Java Based Open Source Intranet - Social, Extensible, Cloud Ready Get Started Now And Turn Your Intranet Into A Collaboration Platformhttp://p.sf.net/sfu/ExoPlatform -- Ganglia-general mailing listganglia-gene...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general Vladimir -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general