Hi,
I am using Ganglia Web Frontend version 3.5.12 and Ganglia Web Backend (gmetad)
version 3.6.0. The Gmond version on the nodes is not consistent, since they are
being set by different users, on different environments. But I believe their
version is not below 3.1.7.
No, I am not using RRDCached...all of my Ganglia configurations are the default
ones. I'll try to set that up.
Since you believe it is a scaling problem, should I try to store the DB in
ramdisk?
Cumprimentos / Best regards,
Cristóvão José Domingues Cordeiro
________________________________
From: Vladimir Vuksan [vli...@veus.hr]
Sent: 19 May 2014 16:37
To: Cristovao Jose Domingues Cordeiro; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Random blank timeslots in graphs
Error 1 sending messages are a red herring.
If you are seeing gaps it's most likely that storage system is not keeping up.
What version of ganglia are you using and are you using rrdcached ?
Vladimir
On 05/19/2014 10:20 AM, Cristovao Jose Domingues Cordeiro wrote:
Hi,
this is happening in two completely different (but with the same deployment
method) Ganglia headnodes.
I'm monitoring about 500 VM's (on each headnode), separated by clusters of
different sizes. From time to time, the summary graphs over some cluster stop
reporting, showing zero activity, and then suddenly after a while they come
back up again.
This is very undesirable since I end up with several white "holes" per day on
each cluster.
The information I can give you so far is the following:
* The attached image shows what happens
* I have a master-slave type of configuration, where the collector gmonds
are sitting in the same machine (the headnode) as gmetad and ganglia-web, and
where all the gmond nodes are reporting their metrics through unicast to the
headnode.
* I have the latest Ganglia versions running (both core and web)
* All VM's are based on SL6
* When I look at /var/log/messages I see a lot of this:
* May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular
data for pkts_out#012
May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for
heartbeat#012
May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for
cpu_user#012
May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for
cpu_system#012
May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for
cpu_idle#012
May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for
cpu_nice#012
May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for
cpu_aidle#012
May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for
cpu_wio#012
May 19 16:14:36 gangliamon gmond[22292]: Error 1 sending the modular data for
cpu_steal#012
May 19 16:14:37 gangliamon gmond[22304]: Error 1 sending the modular data for
heartbeat#012
May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for
cpu_user#012
May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for
cpu_system#012
May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for
cpu_idle#012
May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for
cpu_nice#012
May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for
cpu_aidle#012
May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for
cpu_wio#012
May 19 16:14:38 gangliamon gmond[10560]: Error 1 sending the modular data for
cpu_steal#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
mem_free#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
mem_shared#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
mem_buffers#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
mem_cached#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
swap_free#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
bytes_out#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
bytes_in#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
pkts_in#012
May 19 16:14:39 gangliamon gmond[22300]: Error 1 sending the modular data for
pkts_out#012
May 19 16:14:40 gangliamon gmond[10560]: Error 1 sending the modular data for
heartbeat#012
May 19 16:14:42 gangliamon gmond[22304]: Error 1 sending the modular data for
disk_free#012
....
Which I understand is a known unsolved issue, by looking at other discussions
like
https://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg06602.html
.
Does anyone know how to solve this?
Thanks
Cumprimentos / Best regards,
Cristóvão José Domingues Cordeiro
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net<mailto:Ganglia-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/ganglia-general
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general