Re: [Ganglia-general] nested grids questions/issues

Matthew Nicholson Wed, 22 Feb 2012 06:09:51 -0800

Yeah, telnetting to the "remote" gmetad's is just fine. Its as soon as
I upgrade the gmetad talking to those, i stop getting info.


Bug report time...

On Wed, Feb 22, 2012 at 5:39 AM, Alexander Karner <a...@de.ibm.com> wrote:
> Hi!
>
> Please check if your remote gmetad's export data by running
> telnet <host> <xml port>
>
> --> I had a similar behaviour as I upgraded my central gmetad to 3.2.0. No
> data was collected from the other grids but running the telnet command
> returned a long list of XML data.
> Switching back to 3.1.7 solved the problem.
>
> If your remote systems are able to export data on the XML port, your central
> gmetad seems to have the same problem that I had
>
>
>
>
> From:        Matthew Nicholson <matthew.a.nichol...@gmail.com>
> To:        ganglia-general@lists.sourceforge.net,
> Date:        19.02.2012 19:07
> Subject:        [Ganglia-general] nested grids questions/issues
> ________________________________
>
>
>
> So, I recently inherited a large ganglia installation we use to
> monitor out HPC cluster and associated services. Due to our module, we
> have the needs to break our cluster and storage up into smaller
> "clusters" that serve specific purposes, aggregate those into grids ,
> and then those grids into another "master" level grid, though in one
> case there is a 3rd level of grid aggregation.
>
> This is all unicast based, and we (I' sure I'll be told to do other
> wise, but thats not an option currently), run ~55 gmond's and ~
> gmetads on our "ganglia" box. Everything communicates to this on a
> range of unicast ports.
>
> More info on our nesting:
> Master(gmetad) -> 3 other gmetad's -> lots and lots of gmond's
>                        -> 1 gmetad for storage -> 2 gmetads (lustre +
> nfs) -> lots of gmonds
>
> Thats basically it.
>
> Okay, so this works. It is currently working, but, the gmetad's fall
> over form time to time, and is running ganglia 3.1.4. We would like to
> get everything up to 3.3.0/1, and update our web frontend as well.
>
> I've been updating the gmond's service side without issues, and the
> immediate parent gmetads (that is, gmetad's that only collect from
> gmond's) also without issue.
>
> However, as soon as I restart a gmetad that polls other gmetads (the
> gmetad_storage, for example), I get no summary information at all. The
> only change is I'm starting a different binary in the init script. It
> runs/starts without error, and with debugging, I get:
>
> Going to run as user nobody
> Sources are ...
> Source: [NFS, step 15] has 1 sources
>                 127.0.0.1
> Source: [Lustre, step 15] has 1 sources
>                 127.0.0.1
> xml listening on port 8657
> interactive xml listening on port 8658
> cleanup thread has been started
> Data thread 1168345408 is monitoring [NFS] data source
>                 127.0.0.1
> Data thread 1178835264 is monitoring [Lustre] data source
>                 127.0.0.1
>
> Where, as, with the older, 3.1.4 binary:
> Going to run as user nobody
> Sources are ...
> Source: [NFS, step 15] has 1 sources
>                 127.0.0.1
> Source: [Lustre, step 15] has 1 sources
>                 127.0.0.1
> xml listening on port 8657
> interactive xml listening on port 8658
> Data thread 1170368832 is monitoring [NFS] data source
>                 127.0.0.1
> Data thread 1180858688 is monitoring [Lustre] data source
>                 127.0.0.1
> cleanup thread has been started
> [NFS] is a 2.5 or later data stream
> hash_create size = 50
> hash->size is 53
> Found a <GRID>, depth is now 1
> Found a </GRID>, depth is now 0
> Writing Summary data for source NFS, metric
> storage_local__nfs_cleanenergy1_size
> Writing Summary data for source NFS, metric disk_free
> Writing Summary data for source NFS, metric
> storage_local__nfs_nobackup2_percent_used
> Writing Summary data for source NFS, metric storage_local__itc1_percent_used
> Writing Summary data for source NFS, metric storage_local__mnt_emcback7_size
> Writing Summary data for source NFS, metric
> storage_local__nfs_atlascode_size
> Writing Summary data for source NFS, metric bytes_out
> etc etc etc
>
>
> I've been unable to find much on issues like this, no noted changes to
> the way gmetad can read downstream gmetads, and no obvious config
> options in 3.3.0.
>
> Am I missing something?
> I'll happily provide gmetad configs if needed.
>
> --
> Matthew Nicholson
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>



-- 
Matthew Nicholson

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] nested grids questions/issues

Reply via email to