> that is the same problem.  heartbeat messages are sent every 
> 15 seconds so 
> if a machine doesn't get a heartbeat message in 60 seconds (4 missed 
> heartbeats) it assumes it is down.  if you use the latest CVS 
> source you 
> should see the problem no longer is there.  let me know otherwise.

Just checked out the latest CVS (at 13:06 NZST). It looks like that
problem is fixed. The REPORTED value looks like its incrementing as it
should....sweet!

BUT

Gstat doesn't like talking to gmond (this happens whether I try to
connect using gstat on the local machine, or remotely...

The following is from gmond running with debug = 6

<snip>
4 pre_process_node() remote_ip=10.0.1.130
pre_process_node() HOSTNAME =tycho.peace.co.nz
pre_process_node() TIMESTAMP=1028682333
pre_process_node() HASHP    =100128bc0
pre_process_node() USER_HASHP=1001290e0
pre_process_node() returning the ganglia internal hash pointer 100128bc0
mcast_listen_thread() got internal hash 100128bc0
mcast_listen_thread() built metricdata struct
mcast_listen_thread() attempting to hash_insert_data
mcast_listen_thread() inserted data into 100128bc0
server_thread() 6 clientfd = 11

sent data to host 127.0.0.1
Broken Pipe
</snip>

However, telnetting to the port works fine (and I can do this as much as
I like and gmond stays alive, just running gstat kills it straight
away):

lilo:~# telnet tycho 8649
Trying 10.0.1.130...
Connected to tycho.peace.co.nz.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
   <!ELEMENT GANGLIA_XML (CLUSTER)+>
   <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED
                         SOURCE  CDATA #REQUIRED>

Etc, etc, etc (all the stats are there)...

Any ideas about this one?

Thanks for the prompt and helpful answers

James


Reply via email to