On Sun, Oct 28, 2007 at 12:01:30PM -0400, Andrew Rowland wrote:
> 
> No difference.  But, I would still expect gstat to work on the x86 when
> I SSH into it and execute locally.  That's not the case though.

No; gmond is crashing in your system because it is generating some broken XML.

The behaviour is the same if you call gstat locally or remotely or if you 
telnet to port 8649 or if you configure gmetad to poll the data from it.

Usually the first entry corresponds to yourself (because that is the first
node notification ever seen), and to build that entry it needs to do a reverse
DNS lookup to figure out what the name of the node is based on the IP it get
the notification for and create an entry like :

<HOST NAME="dell.sajinet.com.pe" IP="192.168.0.2" REPORTED="1193590222" TN="3"
TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1193590202">

You are going to have to restart gmond every time you make a change to confirm
it makes a difference or not, and would be better also to keep all
other gmond in the same multicast domain down to avoid affecting the result.

> I am using dns.  The /etc/nsswitch.conf is identical on both machines.  

then you have a bigger problem with your glibc, as the lsof showed clearly
that you were only loading libnss_files and not libnss_dns.

is the working box showing the same?

> > I can't reproduce the problem here (using similar names that you do and a
> > similar configuration but with glibc 2.6 in a gentoo 2007.0 x86), getting 
> > the
> > output of the XML generated by gmond until it crashes (with a telnet to 
> > 8649)
> > or a core dump of gmond could probably help further.
> 
> The results of telnet localhost 8649, which crashes the gmond daemon
> straight away, on the "non-working node."
> 
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
> <!DOCTYPE GANGLIA_XML [
>    <!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*>
>       <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>
>       <!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED>
>    <!ELEMENT GRID (CLUSTER | GRID | HOSTS | METRICS)*>
>       <!ATTLIST GRID NAME CDATA #REQUIRED>
>       <!ATTLIST GRID AUTHORITY CDATA #REQUIRED>
>       <!ATTLIST GRID LOCALTIME CDATA #IMPLIED>
>    <!ELEMENT CLUSTER (HOST | HOSTS | METRICS)*>
>       <!ATTLIST CLUSTER NAME CDATA #REQUIRED>
>       <!ATTLIST CLUSTER OWNER CDATA #IMPLIED>
>       <!ATTLIST CLUSTER LATLONG CDATA #IMPLIED>
>       <!ATTLIST CLUSTER URL CDATA #IMPLIED>
>       <!ATTLIST CLUSTER LOCALTIME CDATA #REQUIRED>
>    <!ELEMENT HOST (METRIC)*>
>       <!ATTLIST HOST NAME CDATA #REQUIRED>
>       <!ATTLIST HOST IP CDATA #REQUIRED>
>       <!ATTLIST HOST LOCATION CDATA #IMPLIED>
>       <!ATTLIST HOST REPORTED CDATA #REQUIRED>
>       <!ATTLIST HOST TN CDATA #IMPLIED>
>       <!ATTLIST HOST TMAX CDATA #IMPLIED>
>       <!ATTLIST HOST DMAX CDATA #IMPLIED>
>       <!ATTLIST HOST GMOND_STARTED CDATA #IMPLIED>
>    <!ELEMENT METRIC EMPTY>
>       <!ATTLIST METRIC NAME CDATA #REQUIRED>
>       <!ATTLIST METRIC VAL CDATA #REQUIRED>
>       <!ATTLIST METRIC TYPE (string | int8 | uint8 | int16 | uint16 |
> int32 | uint32 | float | double | timestamp) #REQUIRED>
>       <!ATTLIST METRIC UNITS CDATA #IMPLIED>
>       <!ATTLIST METRIC TN CDATA #IMPLIED>
>       <!ATTLIST METRIC TMAX CDATA #IMPLIED>
>       <!ATTLIST METRIC DMAX CDATA #IMPLIED>
>       <!ATTLIST METRIC SLOPE (zero | positive | negative | both |
> unspecified) #IMPLIED>
>       <!ATTLIST METRIC SOURCE (gmond | gmetric) #REQUIRED>
>    <!ELEMENT HOSTS EMPTY>
>       <!ATTLIST HOSTS UP CDATA #REQUIRED>
>       <!ATTLIST HOSTS DOWN CDATA #REQUIRED>
>       <!ATTLIST HOSTS SOURCE (gmond | gmetric | gmetad) #REQUIRED>
>    <!ELEMENT METRICS EMPTY>
>       <!ATTLIST METRICS NAME CDATA #REQUIRED>
>       <!ATTLIST METRICS SUM CDATA #REQUIRED>
>       <!ATTLIST METRICS NUM CDATA #REQUIRED>
>       <!ATTLIST METRICS TYPE (string | int8 | uint8 | int16 | uint16 |
> int32 | uint32 | float | double | timestamp) #REQUIRED>
>       <!ATTLIST METRICS UNITS CDATA #IMPLIED>
>       <!ATTLIST METRICS SLOPE (zero | positive | negative | both |
> unspecified) #IMPLIED>
>       <!ATTLIST METRICS SOURCE (gmond | gmetric) #REQUIRED>
> ]>
> <GANGLIA_XML VERSION="3.0.5" SOURCE="gmond">
> <CLUSTER NAME="clusterfsck" LOCALTIME="1193582978" OWNER="The ReliaFree
> Project" LATLONG="unspecified" URL="http://reliafree.sourceforge.net";>
> <HOST NAME="legolas.clusterfsck" IP="172.16.1.101" REPORTED="1193582961"
> TN="17" TMAX="20" DMAX="0" LOCATION="unspecified"
> GMOND_STARTED="1193582961">

this output still shows the ficticious FQDN.

# pkill gmond
# /usr/sbin/gmond -t > /etc/gmond.conf
# /usr/sbin/gmond

still crashes?, then your last resort is to compile gmond with debugging
enabled (-g) and get a core dump out of it.

but more and more this looks like a system configuration issue than a gmond
problem (eventhough I agree the gmond crashing is a bug regardless).

please open a bug with that information in :

  http://bugzilla.ganglia.info/

Carlo

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to