Yeah, telnetting to the "remote" gmetad's is just fine. Its as soon as I upgrade the gmetad talking to those, i stop getting info.
Bug report time... On Wed, Feb 22, 2012 at 5:39 AM, Alexander Karner <a...@de.ibm.com> wrote: > Hi! > > Please check if your remote gmetad's export data by running > telnet <host> <xml port> > > --> I had a similar behaviour as I upgraded my central gmetad to 3.2.0. No > data was collected from the other grids but running the telnet command > returned a long list of XML data. > Switching back to 3.1.7 solved the problem. > > If your remote systems are able to export data on the XML port, your central > gmetad seems to have the same problem that I had > > > > > From: Matthew Nicholson <matthew.a.nichol...@gmail.com> > To: ganglia-general@lists.sourceforge.net, > Date: 19.02.2012 19:07 > Subject: [Ganglia-general] nested grids questions/issues > ________________________________ > > > > So, I recently inherited a large ganglia installation we use to > monitor out HPC cluster and associated services. Due to our module, we > have the needs to break our cluster and storage up into smaller > "clusters" that serve specific purposes, aggregate those into grids , > and then those grids into another "master" level grid, though in one > case there is a 3rd level of grid aggregation. > > This is all unicast based, and we (I' sure I'll be told to do other > wise, but thats not an option currently), run ~55 gmond's and ~ > gmetads on our "ganglia" box. Everything communicates to this on a > range of unicast ports. > > More info on our nesting: > Master(gmetad) -> 3 other gmetad's -> lots and lots of gmond's > -> 1 gmetad for storage -> 2 gmetads (lustre + > nfs) -> lots of gmonds > > Thats basically it. > > Okay, so this works. It is currently working, but, the gmetad's fall > over form time to time, and is running ganglia 3.1.4. We would like to > get everything up to 3.3.0/1, and update our web frontend as well. > > I've been updating the gmond's service side without issues, and the > immediate parent gmetads (that is, gmetad's that only collect from > gmond's) also without issue. > > However, as soon as I restart a gmetad that polls other gmetads (the > gmetad_storage, for example), I get no summary information at all. The > only change is I'm starting a different binary in the init script. It > runs/starts without error, and with debugging, I get: > > Going to run as user nobody > Sources are ... > Source: [NFS, step 15] has 1 sources > 127.0.0.1 > Source: [Lustre, step 15] has 1 sources > 127.0.0.1 > xml listening on port 8657 > interactive xml listening on port 8658 > cleanup thread has been started > Data thread 1168345408 is monitoring [NFS] data source > 127.0.0.1 > Data thread 1178835264 is monitoring [Lustre] data source > 127.0.0.1 > > Where, as, with the older, 3.1.4 binary: > Going to run as user nobody > Sources are ... > Source: [NFS, step 15] has 1 sources > 127.0.0.1 > Source: [Lustre, step 15] has 1 sources > 127.0.0.1 > xml listening on port 8657 > interactive xml listening on port 8658 > Data thread 1170368832 is monitoring [NFS] data source > 127.0.0.1 > Data thread 1180858688 is monitoring [Lustre] data source > 127.0.0.1 > cleanup thread has been started > [NFS] is a 2.5 or later data stream > hash_create size = 50 > hash->size is 53 > Found a <GRID>, depth is now 1 > Found a </GRID>, depth is now 0 > Writing Summary data for source NFS, metric > storage_local__nfs_cleanenergy1_size > Writing Summary data for source NFS, metric disk_free > Writing Summary data for source NFS, metric > storage_local__nfs_nobackup2_percent_used > Writing Summary data for source NFS, metric storage_local__itc1_percent_used > Writing Summary data for source NFS, metric storage_local__mnt_emcback7_size > Writing Summary data for source NFS, metric > storage_local__nfs_atlascode_size > Writing Summary data for source NFS, metric bytes_out > etc etc etc > > > I've been unable to find much on issues like this, no noted changes to > the way gmetad can read downstream gmetads, and no obvious config > options in 3.3.0. > > Am I missing something? > I'll happily provide gmetad configs if needed. > > -- > Matthew Nicholson > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general > -- Matthew Nicholson ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general