Il giorno sab, 28/07/2007 alle 02.04 +0100, richard grevis ha scritto: > Scenario 2 (maybe your case) - [..] > - Errors for 'summary' RRDs and not host RRDs? You almost certainly have > two distinct clusters that have the same cluster name at the grid level. > netcat the gmetad server on port 8651 and look for cluster dups. Also > nc all clusters and grep for identical cluster names on distinct clusters. > It will be the headnode gmond.conf on a headnode that needs fixing.
It's very strange. We have a simple configuration with remote clusters running gmond on all nodes, gmetad on one node and a central gmetad on the machine with the web frontend. Central gmetad config: web_frontend # grep -v "#" /etc/gmetad.conf data_source "SP5" 25 xxx.xxx.xxx.xxx:8651 data_source "GNU_Linux_Cluster" 25 xxx.xxx.xxx.xxx:8651 data_source "Cray_XD1_Linux_Cluster" 25 xxx.xxx.xxx.xxx:8651 data_source "Front_End_Cluster" 25 xxx.xxx.xxx.xxx:8651 data_source "BCC_Linux_Cluster" 25 xxx.xxx.xxx.xxx:8651 data_source "BCX_Linux_Cluster" 25 xxx.xxx.xxx.xxx:8651 scalable off gridname "CINECA" authority "http://<web_frontend_address>" rrd_rootdir "/dev/shm/ganglia/rrds" Remote clusters gmetad config: clusterN # grep -v "#" /etc/gmetad.conf data_source "GNU_Linux_Cluster" 25 10.10.12.1 10.10.12.2 10.10.12.100 authority "http://<web_frontend_address>" trusted_hosts <ip_of_webfrontend> rrd_rootdir "/dev/shm/rrds" With ganglia version > 3.0.2 we have hundreds of these errors for all the cluster and all the metrics: Jul 30 17:33:25 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/mem_free.rrd): illegal attempt to update using time 1185809583 when last update time is 1185809583 (minimum one second step) Jul 30 17:33:25 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/cpu_system.rrd): illegal attempt to update using time 1185809583 when last update time is 1185809583 (minimum one second step) Jul 30 17:33:25 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/proc_run.rrd): illegal attempt to update using time 1185809583 when last update time is 1185809583 (minimum one second step) Jul 30 17:33:25 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/mem_total.rrd): illegal attempt to update using time 1185809583 when last update time is 1185809583 (minimum one second step) [..] Jul 30 17:33:33 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185809605 when last update time is 1185809605 (minimum one second step) Jul 30 17:33:33 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/bytes_out.rrd): illegal attempt to update using time 1185809605 when last update time is 1185809605 (minimum one second step) Jul 30 17:33:33 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/proc_total.rrd): illegal attempt to update using time 1185809605 when last update time is 1185809605 (minimum one second step) [..] Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185809606 when last update time is 1185809606 (minimum one second step) Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/bytes_out.rrd): illegal attempt to update using time 1185809606 when last update time is 1185809606 (minimum one second step) Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/proc_total.rrd): illegal attempt to update using time 1185809606 when last update time is 1185809606 (minimum one second step) Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/cpu_nice.rrd): illegal attempt to update using time 1185809606 when last update time is 1185809606 (minimum one second step) Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update (/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/pkts_in.rrd): illegal attempt to update using time 1185809606 when last update time is 1185809606 (minimum one second step) And so on... With ganglia 3.0.2 we see the errors only for disk_free metric: ul 30 17:42:59 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810165 when last update time is 1185810165 (minimum one second step) Jul 30 17:43:02 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810173 when last update time is 1185810173 (minimum one second step) Jul 30 17:43:05 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810167 when last update time is 1185810167 (minimum one second step) Jul 30 17:43:08 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/GNU_Linux_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810168 when last update time is 1185810168 (minimum one second step) Jul 30 17:43:11 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/Front_End_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810171 when last update time is 1185810171 (minimum one second step) Jul 30 17:43:13 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/BCX_Linux_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810183 when last update time is 1185810183 (minimum one second step) Jul 30 17:43:26 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810186 when last update time is 1185810186 (minimum one second step) Jul 30 17:43:28 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810196 when last update time is 1185810196 (minimum one second step) Jul 30 17:43:28 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810187 when last update time is 1185810187 (minimum one second step) Jul 30 17:43:32 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/GNU_Linux_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810193 when last update time is 1185810193 (minimum one second step) Jul 30 17:43:34 tanabis /usr/sbin/gmetad[13501]: RRD_update (/dev/shm/ganglia/rrds/Front_End_Cluster/__SummaryInfo__/disk_free.rrd): illegal attempt to update using time 1185810192 when last update time is 1185810192 (minimum one second step) Best Regards -- Andrea Capriotti System Management Group - Cineca - www.cineca.it [EMAIL PROTECTED] - Tel +39 051 6171890 ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

