[Ganglia-general] missing data on large clusters

2015-08-18 Thread Ludmil Stamboliyski
Hello, I am testing deploing ganglia to monitor our servers. I have several clusters - most of them are small ones, but I do have two large ones - with over 150 machines to monitor. The issue is that I do not receive all monitoring data from the machines in large clusters - ganglia-web

Re: [Ganglia-general] missing data on large clusters

2015-08-19 Thread Ludmil Stamboliyski
...@a2o.si: Ludmil, do you have multiple headnodes? Do they receive data from all the nodes? If yes, did you verify it (telnet to each headnode to port 8649 and count occurences of HOST... xml tag)? b. On 19 August 2015 at 12:01, Ludmil Stamboliyski l.stamboliy...@ucdn.com wrote: Thank you

Re: [Ganglia-general] missing data on large clusters

2015-08-19 Thread Ludmil Stamboliyski
of received metrics. I am still diggin into what the hell is wrong with ubuntu tcp stack, but alas it is working fine with udp. 2015-08-19 23:34 GMT+03:00 Ludmil Stamboliyski l.stamboliy...@ucdn.com: So... I got the culprit - it turns out that carbon-cache is slowing down the whole gmeta daemon

Re: [Ganglia-general] missing data on large clusters

2015-08-19 Thread Ludmil Stamboliyski
size on the aggregator, to the value set in the kernel sysctl net.core.rmem_max: udp_recv_channel { ... buffer = 4194304 } On the gmetad, I use memcached. It only runs the default 4 threads. Good luck, Dave On Tue, Aug 18, 2015 at 7:25 AM, Ludmil Stamboliyski l.stamboliy...@ucdn.com