Hello
Weizmann Institute farm runs plain gmond happily on multiple machines,
except for one.:
On that machine gmond service does run, it does respond to 'telnet
localhost 8649',
but the response only consists of the list of metrics and no collected
data.
The task of the 'failing' machine is a cluster gateway,
- It has two rather than one (active) network cards.
- The security policy for this machine is tighter than that of other
machines.
(Note however that the 'failing' machine does respond locally to 'telnet
localhost 8649')
- It runs kernel 2.6.9-11.ELsmp while the other machines run a 2.4 kernel
Except for these differences, the hw and SW of this machine looks similar to
that other machine/s in the farm.
Following are more details concerning this machine and running gmond on it.
Please guide us how to detect the reason for one machine failing to collect
data via gmond.
In particular, please indicate if the tight security policy may cause the
problem (even though telnet does respond).
Thanks in advance
David Front
SW engineer
particle physics department
Weizmann Institute of Science
Israel
The machine is: Pentium III (Coppermine) dual CPU 512 MB memory
The kernel is: 2.6.9-11.ELsmp
The linux version is: Red Hat Enterprise Linux AS release 4 (Nahant Update
1)
All machines run gmond 3.0.1.
Replacing 3.0.1 by 3.0.2 on the 'failing' machine did not make a difference.
All machines have the same (default) /etc/gmond.cong
The output of gstat on the 'failing' machine:
CLUSTER INFORMATION
Name: unspecified
Hosts: 0
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Sun Mar 19 13:56:13 2006
There are no hosts running gexec at this time
The output of 'telnet localhost 8649':
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
<!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*>
<!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>
<!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED>
<!ELEMENT GRID (CLUSTER | GRID | HOSTS | METRICS)*>
<!ATTLIST GRID NAME CDATA #REQUIRED>
<!ATTLIST GRID AUTHORITY CDATA #REQUIRED>
<!ATTLIST GRID LOCALTIME CDATA #IMPLIED>
<!ELEMENT CLUSTER (HOST | HOSTS | METRICS)*>
<!ATTLIST CLUSTER NAME CDATA #REQUIRED>
<!ATTLIST CLUSTER OWNER CDATA #IMPLIED>
<!ATTLIST CLUSTER LATLONG CDATA #IMPLIED>
<!ATTLIST CLUSTER URL CDATA #IMPLIED>
<!ATTLIST CLUSTER LOCALTIME CDATA #REQUIRED>
<!ELEMENT HOST (METRIC)*>
<!ATTLIST HOST NAME CDATA #REQUIRED>
<!ATTLIST HOST IP CDATA #REQUIRED>
<!ATTLIST HOST LOCATION CDATA #IMPLIED>
<!ATTLIST HOST REPORTED CDATA #REQUIRED>
<!ATTLIST HOST TN CDATA #IMPLIED>
<!ATTLIST HOST TMAX CDATA #IMPLIED>
<!ATTLIST HOST DMAX CDATA #IMPLIED>
<!ATTLIST HOST GMOND_STARTED CDATA #IMPLIED>
<!ELEMENT METRIC EMPTY>
<!ATTLIST METRIC NAME CDATA #REQUIRED>
<!ATTLIST METRIC VAL CDATA #REQUIRED>
<!ATTLIST METRIC TYPE (string | int8 | uint8 | int16 | uint16 | int32
| uint32 | float | double | timestamp) #REQUIRED>
<!ATTLIST METRIC UNITS CDATA #IMPLIED>
<!ATTLIST METRIC TN CDATA #IMPLIED>
<!ATTLIST METRIC TMAX CDATA #IMPLIED>
<!ATTLIST METRIC DMAX CDATA #IMPLIED>
<!ATTLIST METRIC SLOPE (zero | positive | negative | both |
unspecified) #IMPLIED>
<!ATTLIST METRIC SOURCE (gmond | gmetric) #REQUIRED>
<!ELEMENT HOSTS EMPTY>
<!ATTLIST HOSTS UP CDATA #REQUIRED>
<!ATTLIST HOSTS DOWN CDATA #REQUIRED>
<!ATTLIST HOSTS SOURCE (gmond | gmetric | gmetad) #REQUIRED>
<!ELEMENT METRICS EMPTY>
<!ATTLIST METRICS NAME CDATA #REQUIRED>
<!ATTLIST METRICS SUM CDATA #REQUIRED>
<!ATTLIST METRICS NUM CDATA #REQUIRED>
<!ATTLIST METRICS TYPE (string | int8 | uint8 | int16 | uint16 | int32
| uint32 | float | double | timestamp) #REQUIRED>
<!ATTLIST METRICS UNITS CDATA #IMPLIED>
<!ATTLIST METRICS SLOPE (zero | positive | negative | both |
unspecified) #IMPLIED>
<!ATTLIST METRICS SOURCE (gmond | gmetric) #REQUIRED>
]>
<GANGLIA_XML VERSION="3.0.1" SOURCE="gmond">
<CLUSTER NAME="unspecified" LOCALTIME="1142769277" OWNER="unspecified"
LATLONG="unspecified" URL="unspecified">
</CLUSTER>
</GANGLIA_XML>