I don't use gstat. It was broken when I first dared to attempt my Solaris
port and I never bothered to fix it once I realized that it was basically a
command-line app that connects to localhost:8649 and parses the output into
an ASCII table.
At first I thought the problem was that *gmetric* wasn't working ... whew!
Does the debug value mean anything, Matt? In all the debug_msg's that I've
written, none of them seem to support verbosity weights, service masks, or
what-have-you. Just wondering if I'm gonna have to go back and change
stuff if this is on the to-do list. :)
I would *guess* that gstat is attempting to use the Solaris socket library
in a Linux manner and the result may not be kosher (I halfheartedly tossed
a binary together and unleashed it on a fileserver):
> sol.gstat
tcp_connect() setsockopt() TCP_NODELAY error: Invalid argument
Unable to get hostlist from 127.0.0.1 8649!
Looks like gmond's still up.
As a workaround you could hack a Linux gstat into connecting to
solaris_host:8649 - it would be interesting if that worked.
That's all I got, though. :)
matt massie wrote:
james-
you are running on solaris right? i haven't heard of this problem before
on solaris. steve wagner is our resident ganglia solaris guru so he might
know something that i don't. unless we hear from steve, i'll add it to
the bug list and try to determine exactly what's going on here.
again... thanks so much for the feedback!
-matt
Today, James Braid wrote forth saying...
that is the same problem. heartbeat messages are sent every
15 seconds so
if a machine doesn't get a heartbeat message in 60 seconds (4 missed
heartbeats) it assumes it is down. if you use the latest CVS
source you
should see the problem no longer is there. let me know otherwise.
Just checked out the latest CVS (at 13:06 NZST). It looks like that
problem is fixed. The REPORTED value looks like its incrementing as it
should....sweet!
BUT
Gstat doesn't like talking to gmond (this happens whether I try to
connect using gstat on the local machine, or remotely...
The following is from gmond running with debug = 6
<snip>
4 pre_process_node() remote_ip=10.0.1.130
pre_process_node() HOSTNAME =tycho.peace.co.nz
pre_process_node() TIMESTAMP=1028682333
pre_process_node() HASHP =100128bc0
pre_process_node() USER_HASHP=1001290e0
pre_process_node() returning the ganglia internal hash pointer 100128bc0
mcast_listen_thread() got internal hash 100128bc0
mcast_listen_thread() built metricdata struct
mcast_listen_thread() attempting to hash_insert_data
mcast_listen_thread() inserted data into 100128bc0
server_thread() 6 clientfd = 11
sent data to host 127.0.0.1
Broken Pipe
</snip>
However, telnetting to the port works fine (and I can do this as much as
I like and gmond stays alive, just running gstat kills it straight
away):
lilo:~# telnet tycho 8649
Trying 10.0.1.130...
Connected to tycho.peace.co.nz.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
<!ELEMENT GANGLIA_XML (CLUSTER)+>
<!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED
SOURCE CDATA #REQUIRED>
Etc, etc, etc (all the stats are there)...
Any ideas about this one?
Thanks for the prompt and helpful answers
James
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers