Re: [Ganglia-general] missing data on large clusters

2015-08-19 Thread Bostjan Skufca
Ludmil, do you have multiple headnodes? Do they receive data from all the nodes? If yes, did you verify it (telnet to each headnode to port 8649 and count occurences of HOST... xml tag)? b. On 19 August 2015 at 12:01, Ludmil Stamboliyski l.stamboliy...@ucdn.com wrote: Thank you Dave, I've

Re: [Ganglia-general] missing data on large clusters

2015-08-19 Thread Ludmil Stamboliyski
Hi Bostjan and thank you for your time, My setup is: gmond deamons for each machine monitoring configured in unicast, 8 clusters, and one master node on which I have gmond daemon for each cluster running on different port. On the master node I have gmeta daemon configured to send data to

Re: [Ganglia-general] missing data on large clusters

2015-08-19 Thread Bostjan Skufca
Does increasing gmetad's debug level (runs in foreground) yield anything useful? On 19 August 2015 at 21:15, Ludmil Stamboliyski l.stamboliy...@ucdn.com wrote: Hi Bostjan and thank you for your time, My setup is: gmond deamons for each machine monitoring configured in unicast, 8 clusters,

Re: [Ganglia-general] missing data on large clusters

2015-08-19 Thread Ludmil Stamboliyski
Ok guys, thanks to your help we could count this resolved. For anyone who wants to use graphite and carbon-cache - here is a peace of advice - run separate gmeta daemon dedicated only to feeding carbon. The key is to set up carbon and gmeta to communicate by udp - that gave me tripple increase of

Re: [Ganglia-general] missing data on large clusters

2015-08-19 Thread Ludmil Stamboliyski
Thank you Dave, I've done that, but to no avail. Then I do the following - ran separate gmeta for this cluster - up to no avail. Then I thought why I do not make gmeta pull data each second: data_source example large cluster 1 127.0.0.1:port And it seems almost working now - i got data

Re: [Ganglia-general] missing data on large clusters

2015-08-18 Thread David Chin
Hi Ludmil: I had a similar problem a couple of years ago on a cluster with about 200 nodes. Currently, in a new place, I have about 120 nodes. running Ganglia 3.6.1. The difference in the new cluster was changing globals { send_metadata_interval } from 0 to 120, which you already have. The