jonathan-

i'm just guessing here.. 

when you say you've removed metrics from ganglia.. you are saying that you 
modified ./gmond/metric.h and ./gmond/key_metrics.h and removed some 
metrics.. right?  it's important that the key_metrics.h and metric.h 
headers match.

i would run an md5sum on the gmond on each node in your cluster.

% md5sum /usr/sbin/gmond

for every node in the cluster to make sure they match.  if you have a few 
gmond on the multicast channel that don't match you'll have problems.  one 
gmond will be sending one metric and the other gmond will interpret it as 
a different one.  it looks from the messages in syslog that a gmond is 
expecting a string metric but isn't getting one.

the error message below is on line 186 of file ./gmond/listen.c.  i'm 
suspecting that your ./gmond/key_metrics.h enum and your ./gmond/metric.h 
array don't match.  

let me know.. good luck
-matt



Today, Jonathan Pauli wrote forth saying...

> From: Jonathan Pauli <[EMAIL PROTECTED]>
> To: [email protected]
> Date: Thu, 28 Aug 2003 13:04:19 -0500 (CDT)
> Subject: [Ganglia-general] syslog messages
> 
> I recently removed metrics from ganglia and recompiled/redistributed 
> accross our 300 node cluster.
> 
> We have since noticed errors like these filling the logs.
> /var/log messages.x is regularly 320k or so, but we've been
> seeing up to 25 Mb files filled with these errors.
> 
> 
> We also tried to add metrics via gmetric. At the time we noticed
> a huge increase in CPU usage, but this was likely due to people
> using the cluster during maintenance. 
> 
> Any ideas as to what these erros mean? Is this directly related to the 
> recompilation?
> 
> 
> Thanks in advance.
> 
> 
> Aug 28 12:07:04 medusa-slave001 /usr/sbin/gmond[20524]: pre_process_node() 
> failed to get node location (0)
> Aug 28 12:07:09 medusa-slave001 /usr/sbin/gmond[20524]: pre_process_node() 
> failed to get node location (0)
> Aug 28 12:07:09 medusa-slave001 /usr/sbin/gmond[20525]: pre_process_node() 
> failed to get node location (0
> 
> 
> Aug 25 08:45:58 medusa-slave001 /usr/sbin/gmond[20524]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:46:29 medusa-slave001 /usr/sbin/gmond[20525]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:46:34 medusa-slave001 /usr/sbin/gmond[20525]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:46:45 medusa-slave001 /usr/sbin/gmond[20524]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:47:01 medusa-slave001 /usr/sbin/gmond[20525]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:47:35 medusa-slave001 /usr/sbin/gmond[20524]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:47:47 medusa-slave001 /usr/sbin/gmond[20525]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:48:04 medusa-slave001 /usr/sbin/gmond[20524]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:48:20 medusa-slave001 /usr/sbin/gmond[20525]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:48:36 medusa-slave001 /usr/sbin/gmond[20524]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:49:07 medusa-slave001 /usr/sbin/gmond[20524]: 
> mcast_listen_thread() xdr_string() error: Interrupted system call
> Aug 25 08:50:37 medusa-slave001 last message repeated 4 times
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 


Reply via email to