Il giorno sab, 28/07/2007 alle 02.04 +0100, richard grevis ha scritto:

> Scenario 2 (maybe your case) -
[..]
> - Errors for 'summary' RRDs and not host RRDs? You almost certainly have
>   two distinct clusters that have the same cluster name at the grid level.
>   netcat the gmetad server on port 8651 and look for cluster dups. Also
>   nc all clusters and grep for identical cluster names on distinct clusters.
>   It will be the headnode gmond.conf on a headnode that needs fixing.

It's very strange. We have a simple configuration with remote clusters
running gmond on all nodes, gmetad on one node and a central gmetad on
the machine with the web frontend.

Central gmetad config:

web_frontend # grep -v "#" /etc/gmetad.conf

data_source "SP5" 25 xxx.xxx.xxx.xxx:8651
data_source "GNU_Linux_Cluster" 25 xxx.xxx.xxx.xxx:8651
data_source "Cray_XD1_Linux_Cluster" 25 xxx.xxx.xxx.xxx:8651
data_source "Front_End_Cluster" 25 xxx.xxx.xxx.xxx:8651
data_source "BCC_Linux_Cluster" 25 xxx.xxx.xxx.xxx:8651
data_source "BCX_Linux_Cluster" 25 xxx.xxx.xxx.xxx:8651

scalable off

gridname "CINECA"

authority "http://<web_frontend_address>"

rrd_rootdir "/dev/shm/ganglia/rrds"

Remote clusters gmetad config:

clusterN # grep -v "#" /etc/gmetad.conf

data_source "GNU_Linux_Cluster" 25 10.10.12.1 10.10.12.2 10.10.12.100

authority "http://<web_frontend_address>"
trusted_hosts <ip_of_webfrontend>
rrd_rootdir "/dev/shm/rrds"

With ganglia version > 3.0.2 we have hundreds of these errors for all
the cluster and all the metrics:

Jul 30 17:33:25 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/mem_free.rrd):
illegal attempt to update using time 1185809583 when last update time is
1185809583 (minimum one second step) 
Jul 30 17:33:25 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/cpu_system.rrd): 
illegal attempt to update using time 1185809583 when last update time is 
1185809583 (minimum one second step) 
Jul 30 17:33:25 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/proc_run.rrd):
illegal attempt to update using time 1185809583 when last update time is
1185809583 (minimum one second step) 
Jul 30 17:33:25 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/mem_total.rrd):
illegal attempt to update using time 1185809583 when last update time is
1185809583 (minimum one second step) 
[..]
Jul 30 17:33:33 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/disk_free.rrd): illegal
attempt to update using time 1185809605 when last update time is
1185809605 (minimum one second step) 
Jul 30 17:33:33 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/bytes_out.rrd): illegal
attempt to update using time 1185809605 when last update time is
1185809605 (minimum one second step) 
Jul 30 17:33:33 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/proc_total.rrd): illegal
attempt to update using time 1185809605 when last update time is
1185809605 (minimum one second step) 
[..]
Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/disk_free.rrd): 
illegal attempt to update using time 1185809606 when last update time is 
1185809606 (minimum one second step) 
Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/bytes_out.rrd): 
illegal attempt to update using time 1185809606 when last update time is 
1185809606 (minimum one second step) 
Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/proc_total.rrd): 
illegal attempt to update using time 1185809606 when last update time is 
1185809606 (minimum one second step) 
Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/cpu_nice.rrd): 
illegal attempt to update using time 1185809606 when last update time is 
1185809606 (minimum one second step) 
Jul 30 17:33:36 tanabis /usr/sbin/gmetad[12217]: RRD_update
(/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/pkts_in.rrd): 
illegal attempt to update using time 1185809606 when last update time is 
1185809606 (minimum one second step) 

And so on...

With ganglia 3.0.2 we see the errors only for disk_free metric:

ul 30 17:42:59 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/disk_free.rrd): illegal
attempt to update using time 1185810165 when last update time is
1185810165 (minimum one second step) 
Jul 30 17:43:02 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/disk_free.rrd):
illegal attempt to update using time 1185810173 when last update time is
1185810173 (minimum one second step) 
Jul 30 17:43:05 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/disk_free.rrd): 
illegal attempt to update using time 1185810167 when last update time is 
1185810167 (minimum one second step) 
Jul 30 17:43:08 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/GNU_Linux_Cluster/__SummaryInfo__/disk_free.rrd):
illegal attempt to update using time 1185810168 when last update time is
1185810168 (minimum one second step) 
Jul 30 17:43:11 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/Front_End_Cluster/__SummaryInfo__/disk_free.rrd):
illegal attempt to update using time 1185810171 when last update time is
1185810171 (minimum one second step) 
Jul 30 17:43:13 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/BCX_Linux_Cluster/__SummaryInfo__/disk_free.rrd):
illegal attempt to update using time 1185810183 when last update time is
1185810183 (minimum one second step) 
Jul 30 17:43:26 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/SP5/__SummaryInfo__/disk_free.rrd): illegal
attempt to update using time 1185810186 when last update time is
1185810186 (minimum one second step) 
Jul 30 17:43:28 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/BCC_Linux_Cluster/__SummaryInfo__/disk_free.rrd):
illegal attempt to update using time 1185810196 when last update time is
1185810196 (minimum one second step) 
Jul 30 17:43:28 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/Cray_XD1_Linux_Cluster/__SummaryInfo__/disk_free.rrd): 
illegal attempt to update using time 1185810187 when last update time is 
1185810187 (minimum one second step) 
Jul 30 17:43:32 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/GNU_Linux_Cluster/__SummaryInfo__/disk_free.rrd):
illegal attempt to update using time 1185810193 when last update time is
1185810193 (minimum one second step) 
Jul 30 17:43:34 tanabis /usr/sbin/gmetad[13501]: RRD_update
(/dev/shm/ganglia/rrds/Front_End_Cluster/__SummaryInfo__/disk_free.rrd):
illegal attempt to update using time 1185810192 when last update time is
1185810192 (minimum one second step)

Best Regards
-- 
Andrea Capriotti
System Management Group - Cineca - www.cineca.it
[EMAIL PROTECTED] - Tel +39 051 6171890


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to