Bernard,
Yes, I mean ganglia.  Sorry for the confusion.  Thanks for your help.  I'm 
now running into another strange issue.
I have one cluster working fine.  I have a second cluster where the head 
node is monitoring itself fine, and looks to
be doing the aggregation fine, but on the compute nodes, there seems to be 
something a bit strange going on.
If I generate a default_config from gmond, it works fine.  I can then 
change the cluster name and it still works fine.
When I say it works fine, to me this means that I can telnet to 8649 on 
the node and see the detailed metrics in XML.
Then if I change the host to my head node and get rid of the multicast 
joins and bind, when I telnet to the compute
node on port 8649, it gives me only the following:

compute001:/etc/ganglia # telnet localhost 8649
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
   <!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*>
      <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>
      <!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED>
   <!ELEMENT GRID (CLUSTER | GRID | HOSTS | METRICS)*>
      <!ATTLIST GRID NAME CDATA #REQUIRED>
      <!ATTLIST GRID AUTHORITY CDATA #REQUIRED>
      <!ATTLIST GRID LOCALTIME CDATA #IMPLIED>
   <!ELEMENT CLUSTER (HOST | HOSTS | METRICS)*>
      <!ATTLIST CLUSTER NAME CDATA #REQUIRED>
      <!ATTLIST CLUSTER OWNER CDATA #IMPLIED>
      <!ATTLIST CLUSTER LATLONG CDATA #IMPLIED>
      <!ATTLIST CLUSTER URL CDATA #IMPLIED>
      <!ATTLIST CLUSTER LOCALTIME CDATA #REQUIRED>
   <!ELEMENT HOST (METRIC)*>
      <!ATTLIST HOST NAME CDATA #REQUIRED>
      <!ATTLIST HOST IP CDATA #REQUIRED>
      <!ATTLIST HOST LOCATION CDATA #IMPLIED>
      <!ATTLIST HOST REPORTED CDATA #REQUIRED>
      <!ATTLIST HOST TN CDATA #IMPLIED>
      <!ATTLIST HOST TMAX CDATA #IMPLIED>
      <!ATTLIST HOST DMAX CDATA #IMPLIED>
      <!ATTLIST HOST GMOND_STARTED CDATA #IMPLIED>
   <!ELEMENT METRIC (EXTRA_DATA*)>
      <!ATTLIST METRIC NAME CDATA #REQUIRED>
      <!ATTLIST METRIC VAL CDATA #REQUIRED>
      <!ATTLIST METRIC TYPE (string | int8 | uint8 | int16 | uint16 | 
int32 | uint32 | float | double | timestamp) #REQUIRED>
      <!ATTLIST METRIC UNITS CDATA #IMPLIED>
      <!ATTLIST METRIC TN CDATA #IMPLIED>
      <!ATTLIST METRIC TMAX CDATA #IMPLIED>
      <!ATTLIST METRIC DMAX CDATA #IMPLIED>
      <!ATTLIST METRIC SLOPE (zero | positive | negative | both | 
unspecified) #IMPLIED>
      <!ATTLIST METRIC SOURCE (gmond) 'gmond'>
   <!ELEMENT EXTRA_DATA (EXTRA_ELEMENT*)>
   <!ELEMENT EXTRA_ELEMENT EMPTY>
      <!ATTLIST EXTRA_ELEMENT NAME CDATA #REQUIRED>
      <!ATTLIST EXTRA_ELEMENT VAL CDATA #REQUIRED>
   <!ELEMENT HOSTS EMPTY>
      <!ATTLIST HOSTS UP CDATA #REQUIRED>
      <!ATTLIST HOSTS DOWN CDATA #REQUIRED>
      <!ATTLIST HOSTS SOURCE (gmond | gmetad) #REQUIRED>
   <!ELEMENT METRICS (EXTRA_DATA*)>
      <!ATTLIST METRICS NAME CDATA #REQUIRED>
      <!ATTLIST METRICS SUM CDATA #REQUIRED>
      <!ATTLIST METRICS NUM CDATA #REQUIRED>
      <!ATTLIST METRICS TYPE (string | int8 | uint8 | int16 | uint16 | 
int32 | uint32 | float | double | timestamp) #REQUIRED>
      <!ATTLIST METRICS UNITS CDATA #IMPLIED>
      <!ATTLIST METRICS SLOPE (zero | positive | negative | both | 
unspecified) #IMPLIED>
      <!ATTLIST METRICS SOURCE (gmond) 'gmond'>
]>
<GANGLIA_XML VERSION="3.1.7" SOURCE="gmond">
<CLUSTER NAME="my cluster" LOCALTIME="1299597872" OWNER="unspecified" 
LATLONG="unspecified" URL="unspecified">
</CLUSTER>
</GANGLIA_XML>
Connection closed by foreign host.


The differences between my gmond.conf and the default gmond.conf


compute001:/etc/ganglia # diff /etc/ganglia/gmond.conf 
/etc/ganglia/gmond.conf.default 
23c23
<   name = "my cluster"
---
>   name = "unspecified"
43c43
<   host = head-eth0
---
>   mcast_join = 239.2.11.71
49a50
>   mcast_join = 239.2.11.71
50a52
>   bind = 239.2.11.71


Important snippet from my gmond.conf on the compute node

 22 cluster {
 23   name = "my cluster"
 24   owner = "unspecified"
 25   latlong = "unspecified"
 26   url = "unspecified"
 27 }

 30 host {
 31   location = "unspecified"
 32 }

 36 udp_send_channel {
 43   host = head-eth0
 44   port = 8649
 45   ttl = 1
 46 }

 49 udp_recv_channel {
 50   port = 8649
 51 }

 55 tcp_accept_channel {
 56   port = 8649
 57 }

Does anybody know what I might be doing wrong that prevents the compute 
nodes from reporting their metrics?

Thanks,
Jeff







Bernard Li <[email protected]> 
03/03/2011 04:47 PM

To
Jeffrey L Moon <[email protected]>
cc
[email protected]
Subject
Re: [Ganglia-general] Including a Head Node in Ganglia Monitoring






Hi Jeffrey:

On Thu, Mar 3, 2011 at 2:34 PM, Jeffrey L Moon <[email protected]> 
wrote:

Thanks for the advice.  I saw somewhere that you can't have the cluster 
name in nagios be the same as the head node.   
Is that the case?  I thought I had tried the option you recommended 
before, but with those being the same and it acted 

I think you meant "Ganglia" as opposed to "Nagios", right?

Anyway, that statement is not true -- where did you read it?

The only thing is that the cluster name displayed in the web frontend is 
actually the name defined in the "name" parameter in the cluster{} clause 
in gmond.conf of the host which the gmetad data_source points to.  The 
name after the data_source parameter in gmetad.conf plays no role in this.

Cheers,

Bernard 
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to