Bernard,
Yes, I mean ganglia. Sorry for the confusion. Thanks for your help. I'm
now running into another strange issue.
I have one cluster working fine. I have a second cluster where the head
node is monitoring itself fine, and looks to
be doing the aggregation fine, but on the compute nodes, there seems to be
something a bit strange going on.
If I generate a default_config from gmond, it works fine. I can then
change the cluster name and it still works fine.
When I say it works fine, to me this means that I can telnet to 8649 on
the node and see the detailed metrics in XML.
Then if I change the host to my head node and get rid of the multicast
joins and bind, when I telnet to the compute
node on port 8649, it gives me only the following:
compute001:/etc/ganglia # telnet localhost 8649
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
<!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*>
<!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>
<!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED>
<!ELEMENT GRID (CLUSTER | GRID | HOSTS | METRICS)*>
<!ATTLIST GRID NAME CDATA #REQUIRED>
<!ATTLIST GRID AUTHORITY CDATA #REQUIRED>
<!ATTLIST GRID LOCALTIME CDATA #IMPLIED>
<!ELEMENT CLUSTER (HOST | HOSTS | METRICS)*>
<!ATTLIST CLUSTER NAME CDATA #REQUIRED>
<!ATTLIST CLUSTER OWNER CDATA #IMPLIED>
<!ATTLIST CLUSTER LATLONG CDATA #IMPLIED>
<!ATTLIST CLUSTER URL CDATA #IMPLIED>
<!ATTLIST CLUSTER LOCALTIME CDATA #REQUIRED>
<!ELEMENT HOST (METRIC)*>
<!ATTLIST HOST NAME CDATA #REQUIRED>
<!ATTLIST HOST IP CDATA #REQUIRED>
<!ATTLIST HOST LOCATION CDATA #IMPLIED>
<!ATTLIST HOST REPORTED CDATA #REQUIRED>
<!ATTLIST HOST TN CDATA #IMPLIED>
<!ATTLIST HOST TMAX CDATA #IMPLIED>
<!ATTLIST HOST DMAX CDATA #IMPLIED>
<!ATTLIST HOST GMOND_STARTED CDATA #IMPLIED>
<!ELEMENT METRIC (EXTRA_DATA*)>
<!ATTLIST METRIC NAME CDATA #REQUIRED>
<!ATTLIST METRIC VAL CDATA #REQUIRED>
<!ATTLIST METRIC TYPE (string | int8 | uint8 | int16 | uint16 |
int32 | uint32 | float | double | timestamp) #REQUIRED>
<!ATTLIST METRIC UNITS CDATA #IMPLIED>
<!ATTLIST METRIC TN CDATA #IMPLIED>
<!ATTLIST METRIC TMAX CDATA #IMPLIED>
<!ATTLIST METRIC DMAX CDATA #IMPLIED>
<!ATTLIST METRIC SLOPE (zero | positive | negative | both |
unspecified) #IMPLIED>
<!ATTLIST METRIC SOURCE (gmond) 'gmond'>
<!ELEMENT EXTRA_DATA (EXTRA_ELEMENT*)>
<!ELEMENT EXTRA_ELEMENT EMPTY>
<!ATTLIST EXTRA_ELEMENT NAME CDATA #REQUIRED>
<!ATTLIST EXTRA_ELEMENT VAL CDATA #REQUIRED>
<!ELEMENT HOSTS EMPTY>
<!ATTLIST HOSTS UP CDATA #REQUIRED>
<!ATTLIST HOSTS DOWN CDATA #REQUIRED>
<!ATTLIST HOSTS SOURCE (gmond | gmetad) #REQUIRED>
<!ELEMENT METRICS (EXTRA_DATA*)>
<!ATTLIST METRICS NAME CDATA #REQUIRED>
<!ATTLIST METRICS SUM CDATA #REQUIRED>
<!ATTLIST METRICS NUM CDATA #REQUIRED>
<!ATTLIST METRICS TYPE (string | int8 | uint8 | int16 | uint16 |
int32 | uint32 | float | double | timestamp) #REQUIRED>
<!ATTLIST METRICS UNITS CDATA #IMPLIED>
<!ATTLIST METRICS SLOPE (zero | positive | negative | both |
unspecified) #IMPLIED>
<!ATTLIST METRICS SOURCE (gmond) 'gmond'>
]>
<GANGLIA_XML VERSION="3.1.7" SOURCE="gmond">
<CLUSTER NAME="my cluster" LOCALTIME="1299597872" OWNER="unspecified"
LATLONG="unspecified" URL="unspecified">
</CLUSTER>
</GANGLIA_XML>
Connection closed by foreign host.
The differences between my gmond.conf and the default gmond.conf
compute001:/etc/ganglia # diff /etc/ganglia/gmond.conf
/etc/ganglia/gmond.conf.default
23c23
< name = "my cluster"
---
> name = "unspecified"
43c43
< host = head-eth0
---
> mcast_join = 239.2.11.71
49a50
> mcast_join = 239.2.11.71
50a52
> bind = 239.2.11.71
Important snippet from my gmond.conf on the compute node
22 cluster {
23 name = "my cluster"
24 owner = "unspecified"
25 latlong = "unspecified"
26 url = "unspecified"
27 }
30 host {
31 location = "unspecified"
32 }
36 udp_send_channel {
43 host = head-eth0
44 port = 8649
45 ttl = 1
46 }
49 udp_recv_channel {
50 port = 8649
51 }
55 tcp_accept_channel {
56 port = 8649
57 }
Does anybody know what I might be doing wrong that prevents the compute
nodes from reporting their metrics?
Thanks,
Jeff
Bernard Li <[email protected]>
03/03/2011 04:47 PM
To
Jeffrey L Moon <[email protected]>
cc
[email protected]
Subject
Re: [Ganglia-general] Including a Head Node in Ganglia Monitoring
Hi Jeffrey:
On Thu, Mar 3, 2011 at 2:34 PM, Jeffrey L Moon <[email protected]>
wrote:
Thanks for the advice. I saw somewhere that you can't have the cluster
name in nagios be the same as the head node.
Is that the case? I thought I had tried the option you recommended
before, but with those being the same and it acted
I think you meant "Ganglia" as opposed to "Nagios", right?
Anyway, that statement is not true -- where did you read it?
The only thing is that the cluster name displayed in the web frontend is
actually the name defined in the "name" parameter in the cluster{} clause
in gmond.conf of the host which the gmetad data_source points to. The
name after the data_source parameter in gmetad.conf plays no role in this.
Cheers,
Bernard
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general