jonathan- i'm just guessing here..
when you say you've removed metrics from ganglia.. you are saying that you modified ./gmond/metric.h and ./gmond/key_metrics.h and removed some metrics.. right? it's important that the key_metrics.h and metric.h headers match. i would run an md5sum on the gmond on each node in your cluster. % md5sum /usr/sbin/gmond for every node in the cluster to make sure they match. if you have a few gmond on the multicast channel that don't match you'll have problems. one gmond will be sending one metric and the other gmond will interpret it as a different one. it looks from the messages in syslog that a gmond is expecting a string metric but isn't getting one. the error message below is on line 186 of file ./gmond/listen.c. i'm suspecting that your ./gmond/key_metrics.h enum and your ./gmond/metric.h array don't match. let me know.. good luck -matt Today, Jonathan Pauli wrote forth saying... > From: Jonathan Pauli <[EMAIL PROTECTED]> > To: [email protected] > Date: Thu, 28 Aug 2003 13:04:19 -0500 (CDT) > Subject: [Ganglia-general] syslog messages > > I recently removed metrics from ganglia and recompiled/redistributed > accross our 300 node cluster. > > We have since noticed errors like these filling the logs. > /var/log messages.x is regularly 320k or so, but we've been > seeing up to 25 Mb files filled with these errors. > > > We also tried to add metrics via gmetric. At the time we noticed > a huge increase in CPU usage, but this was likely due to people > using the cluster during maintenance. > > Any ideas as to what these erros mean? Is this directly related to the > recompilation? > > > Thanks in advance. > > > Aug 28 12:07:04 medusa-slave001 /usr/sbin/gmond[20524]: pre_process_node() > failed to get node location (0) > Aug 28 12:07:09 medusa-slave001 /usr/sbin/gmond[20524]: pre_process_node() > failed to get node location (0) > Aug 28 12:07:09 medusa-slave001 /usr/sbin/gmond[20525]: pre_process_node() > failed to get node location (0 > > > Aug 25 08:45:58 medusa-slave001 /usr/sbin/gmond[20524]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:46:29 medusa-slave001 /usr/sbin/gmond[20525]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:46:34 medusa-slave001 /usr/sbin/gmond[20525]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:46:45 medusa-slave001 /usr/sbin/gmond[20524]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:47:01 medusa-slave001 /usr/sbin/gmond[20525]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:47:35 medusa-slave001 /usr/sbin/gmond[20524]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:47:47 medusa-slave001 /usr/sbin/gmond[20525]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:48:04 medusa-slave001 /usr/sbin/gmond[20524]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:48:20 medusa-slave001 /usr/sbin/gmond[20525]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:48:36 medusa-slave001 /usr/sbin/gmond[20524]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:49:07 medusa-slave001 /usr/sbin/gmond[20524]: > mcast_listen_thread() xdr_string() error: Interrupted system call > Aug 25 08:50:37 medusa-slave001 last message repeated 4 times > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Ganglia-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-general >

