Mr. Front,

My guess is that because you are using two interfaces, gmond is picking the wrong interface for multicast. Use the mcast_if option in the udp_send_channel and udp_recv_channel sections of gmond.conf to constrain multicast traffic to the correct interface.
ex.
   mcast_if = eth1

Ian


david front wrote:
Hello
Weizmann Institute farm runs plain gmond happily on multiple machines, except for one.: On that machine gmond service does run, it does respond to 'telnet localhost 8649', but the response only consists of the list of metrics and no collected data. The task of the 'failing' machine is a cluster gateway, - It has two rather than one (active) network cards. - The security policy for this machine is tighter than that of other machines. (Note however that the 'failing' machine does respond locally to 'telnet localhost 8649') - It runs kernel 2.6.9-11.ELsmp while the other machines run a 2.4 kernel Except for these differences, the hw and SW of this machine looks similar to that other machine/s in the farm. Following are more details concerning this machine and running gmond on it. Please guide us how to detect the reason for one machine failing to collect data via gmond. In particular, please indicate if the tight security policy may cause the problem (even though telnet does respond). Thanks in advance
    David Front
        SW engineer
        particle physics department
        Weizmann Institute of Science
        Israel
The machine is: Pentium III (Coppermine) dual CPU 512 MB memory
The kernel is: 2.6.9-11.ELsmp
The linux version is: Red Hat Enterprise Linux AS release 4 (Nahant Update 1)
All machines run gmond 3.0.1.
Replacing 3.0.1 by 3.0.2 on the 'failing' machine did not make a difference.
All machines have the same (default) /etc/gmond.cong
The output of gstat on the 'failing' machine:
    CLUSTER INFORMATION
    Name: unspecified
    Hosts: 0
    Gexec Hosts: 0
    Dead Hosts: 0
    Localtime: Sun Mar 19 13:56:13 2006
    There are no hosts running gexec at this time
The output of 'telnet localhost 8649':
Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1 <http://127.0.0.1>). Escape character is '^]'. <?xml version=" 1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
   <!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*>
      <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>
      <!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED>
   <!ELEMENT GRID (CLUSTER | GRID | HOSTS | METRICS)*>
      <!ATTLIST GRID NAME CDATA #REQUIRED>
      <!ATTLIST GRID AUTHORITY CDATA #REQUIRED>
      <!ATTLIST GRID LOCALTIME CDATA #IMPLIED>
   <!ELEMENT CLUSTER (HOST | HOSTS | METRICS)*>
      <!ATTLIST CLUSTER NAME CDATA #REQUIRED>
      <!ATTLIST CLUSTER OWNER CDATA #IMPLIED>
      <!ATTLIST CLUSTER LATLONG CDATA #IMPLIED>
      <!ATTLIST CLUSTER URL CDATA #IMPLIED>
      <!ATTLIST CLUSTER LOCALTIME CDATA #REQUIRED>
   <!ELEMENT HOST (METRIC)*>
      <!ATTLIST HOST NAME CDATA #REQUIRED>
      <!ATTLIST HOST IP CDATA #REQUIRED>
      <!ATTLIST HOST LOCATION CDATA #IMPLIED>
      <!ATTLIST HOST REPORTED CDATA #REQUIRED>
      <!ATTLIST HOST TN CDATA #IMPLIED>
      <!ATTLIST HOST TMAX CDATA #IMPLIED>
      <!ATTLIST HOST DMAX CDATA #IMPLIED>
      <!ATTLIST HOST GMOND_STARTED CDATA #IMPLIED>
   <!ELEMENT METRIC EMPTY>
      <!ATTLIST METRIC NAME CDATA #REQUIRED>
      <!ATTLIST METRIC VAL CDATA #REQUIRED>
<!ATTLIST METRIC TYPE (string | int8 | uint8 | int16 | uint16 | int32 | uint32 | float | double | timestamp) #REQUIRED>
      <!ATTLIST METRIC UNITS CDATA #IMPLIED>
      <!ATTLIST METRIC TN CDATA #IMPLIED>
      <!ATTLIST METRIC TMAX CDATA #IMPLIED>
      <!ATTLIST METRIC DMAX CDATA #IMPLIED>
<!ATTLIST METRIC SLOPE (zero | positive | negative | both | unspecified) #IMPLIED>
      <!ATTLIST METRIC SOURCE (gmond | gmetric) #REQUIRED>
   <!ELEMENT HOSTS EMPTY>
      <!ATTLIST HOSTS UP CDATA #REQUIRED>
      <!ATTLIST HOSTS DOWN CDATA #REQUIRED>
      <!ATTLIST HOSTS SOURCE (gmond | gmetric | gmetad) #REQUIRED>
   <!ELEMENT METRICS EMPTY>
      <!ATTLIST METRICS NAME CDATA #REQUIRED>
      <!ATTLIST METRICS SUM CDATA #REQUIRED>
      <!ATTLIST METRICS NUM CDATA #REQUIRED>
<!ATTLIST METRICS TYPE (string | int8 | uint8 | int16 | uint16 | int32 | uint32 | float | double | timestamp) #REQUIRED>
      <!ATTLIST METRICS UNITS CDATA #IMPLIED>
<!ATTLIST METRICS SLOPE (zero | positive | negative | both | unspecified) #IMPLIED>
      <!ATTLIST METRICS SOURCE (gmond | gmetric) #REQUIRED>
]>
<GANGLIA_XML VERSION="3.0.1" SOURCE="gmond">
<CLUSTER NAME="unspecified" LOCALTIME="1142769277" OWNER="unspecified" LATLONG="unspecified" URL="unspecified">
</CLUSTER>
</GANGLIA_XML>
------------------------------------------------------------------------

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
------------------------------------------------------------------------

_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to