> that is the same problem. heartbeat messages are sent every
> 15 seconds so
> if a machine doesn't get a heartbeat message in 60 seconds (4 missed
> heartbeats) it assumes it is down. if you use the latest CVS
> source you
> should see the problem no longer is there. let me know otherwise.
Just checked out the latest CVS (at 13:06 NZST). It looks like that
problem is fixed. The REPORTED value looks like its incrementing as it
should....sweet!
BUT
Gstat doesn't like talking to gmond (this happens whether I try to
connect using gstat on the local machine, or remotely...
The following is from gmond running with debug = 6
<snip>
4 pre_process_node() remote_ip=10.0.1.130
pre_process_node() HOSTNAME =tycho.peace.co.nz
pre_process_node() TIMESTAMP=1028682333
pre_process_node() HASHP =100128bc0
pre_process_node() USER_HASHP=1001290e0
pre_process_node() returning the ganglia internal hash pointer 100128bc0
mcast_listen_thread() got internal hash 100128bc0
mcast_listen_thread() built metricdata struct
mcast_listen_thread() attempting to hash_insert_data
mcast_listen_thread() inserted data into 100128bc0
server_thread() 6 clientfd = 11
sent data to host 127.0.0.1
Broken Pipe
</snip>
However, telnetting to the port works fine (and I can do this as much as
I like and gmond stays alive, just running gstat kills it straight
away):
lilo:~# telnet tycho 8649
Trying 10.0.1.130...
Connected to tycho.peace.co.nz.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
<!ELEMENT GANGLIA_XML (CLUSTER)+>
<!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED
SOURCE CDATA #REQUIRED>
Etc, etc, etc (all the stats are there)...
Any ideas about this one?
Thanks for the prompt and helpful answers
James