Hi, so I've followed the instructions in http://sourceforge.net/apps/trac/ganglia/wiki/rrdcached_integration but sadly now I am getting no graphs on the web interface. The rrdcaches don't seem to be used at all: srwxrwxrwx. 1 nobody nobody 0 May 20 10:57 rrdcached.limited.sock -rw-r--r--. 1 nobody nobody 5 May 20 10:57 rrdcached.pid srw-rw-r--. 1 nobody apache 0 May 20 10:57 rrdcached.sock
Cumprimentos / Best regards, Cristóvão José Domingues Cordeiro ________________________________________ From: Vladimir Vuksan [vli...@veus.hr] Sent: 19 May 2014 16:53 To: Cristovao Jose Domingues Cordeiro; ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Random blank timeslots in graphs I would definitely consider rrdcached backed by some SSDs. That is what I use. 3.7.0 which is in testing has some additional performance enhancements but I think your issue really is I/O. Vladimir On 05/19/2014 10:46 AM, Cristovao Jose Domingues Cordeiro wrote: Hi, I am using Ganglia Web Frontend version 3.5.12 and Ganglia Web Backend (gmetad) version 3.6.0. The Gmond version on the nodes is not consistent, since they are being set by different users, on different environments. But I believe their version is not below 3.1.7. No, I am not using RRDCached...all of my Ganglia configurations are the default ones. I'll try to set that up. Since you believe it is a scaling problem, should I try to store the DB in ramdisk? Cumprimentos / Best regards, Cristóvão José Domingues Cordeiro ________________________________ From: Vladimir Vuksan [vli...@veus.hr<mailto:vli...@veus.hr>] Sent: 19 May 2014 16:37 To: Cristovao Jose Domingues Cordeiro; ganglia-general@lists.sourceforge.net<mailto:ganglia-general@lists.sourceforge.net> Subject: Re: [Ganglia-general] Random blank timeslots in graphs Error 1 sending messages are a red herring. If you are seeing gaps it's most likely that storage system is not keeping up. What version of ganglia are you using and are you using rrdcached ? Vladimir On 05/19/2014 10:20 AM, Cristovao Jose Domingues Cordeiro wrote: Hi, this is happening in two completely different (but with the same deployment method) Ganglia headnodes. I'm monitoring about 500 VM's (on each headnode), separated by clusters of different sizes. From time to time, the summary graphs over some cluster stop reporting, showing zero activity, and then suddenly after a while they come back up again. This is very undesirable since I end up with several white "holes" per day on each cluster. The information I can give you so far is the following: * The attached image shows what happens * I have a master-slave type of configuration, where the collector gmonds are sitting in the same machine (the headnode) as gmetad and ganglia-web, and where all the gmond nodes are reporting their metrics through unicast to the headnode. * I have the latest Ganglia versions running (both core and web) * All VM's are based on SL6 * When I look at /var/log/messages I see a lot of this: * May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for pkts_out#012 May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for heartbeat#012 May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for cpu_user#012 May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for cpu_system#012 May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for cpu_idle#012 May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for cpu_nice#012 May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for cpu_aidle#012 May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for cpu_wio#012 May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for cpu_steal#012 May 19 16:14:37 gangliamon gmond[22304]: Error 1 sending the modular data for heartbeat#012 May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for cpu_user#012 May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for cpu_system#012 May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for cpu_idle#012 May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for cpu_nice#012 May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for cpu_aidle#012 May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for cpu_wio#012 May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for cpu_steal#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for mem_free#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for mem_shared#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for mem_buffers#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for mem_cached#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for swap_free#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for bytes_out#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for bytes_in#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for pkts_in#012 May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for pkts_out#012 May 19 16:14:40 gangliamon gmond[10560]: Error 1 sending the modular data for heartbeat#012 May 19 16:14:42 gangliamon gmond[22304]: Error 1 sending the modular data for disk_free#012 .... Which I understand is a known unsolved issue, by looking at other discussions like https://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg06602.html . Does anyone know how to solve this? Thanks Cumprimentos / Best regards, Cristóvão José Domingues Cordeiro ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net<mailto:Ganglia-general@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/ganglia-general ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general