Leaving the clarity and findability of the documented answer aside,
does anyone know the actual answer to the original question?

We have a cluster with about twenty gmond nodes and one gmetad host.
host_dmax is set to 3600.
Hosts that die never just disappear from the set of graphs.
We go through a nondeterministic procedure of restarting all of the gmonds over and over again,
and eventually the dead graphs disappear and the gmonds begin reporting again.
But I haven't been able to pin down exactly which restart
actually removes them, or why restarting a gmond usually makes it stop reporting.
The readme says "ALL dead hosts will be flushed from the record by restarting the processes"
but it seems there is more to it than that.  It seems they have to be restarted in a certain order,
but we haven't figured out what that order is.

Googling for this answer, I find a lot of folks asking it, and some other folks telling
them to read the whole FAQ or the entire wiki, as if they hadn't tried that already.
I also found some advice to shut the whole thing down, and then bring it all back up again.
That's close, but we have to do it multiple times, and can't figure out why.
Maybe someone who understands ganglia internals can explain it.

-Cameron



Vladimir Vuksan wrote:
I second Martin's request. This has been an ongoing issue so we ought to
simply change the default to e.g. 30 seconds or so. We can put in a comment
in the config file that if you are in multicast environment you may want to
set this to 0.

What's the downside of setting it != 0 ? A bit more network traffic ?

Anyways I think we should shoot for sensible defaults that work for most
people. 

On Thu, 18 Nov 2010 02:44:13 -0800 (PST), Martin Knoblauch
<[email protected]> wrote:
  
Hi Bernard,

----- Original Message ----

    
From: Bernard Li <[email protected]>
To: Louis Coilliot <[email protected]>
Cc: [email protected]
Sent: Wed, November 17, 2010 9:16:22 PM
Subject: Re: [Ganglia-general] restarting the gmond collector node
causes no
data to be reported

Hello:

This is actually documented in both the release notes and the FAQs  in
our
Wiki:

http://sourceforge.net/apps/trac/ganglia/wiki

Please  let us know if anything is unclear.

Thanks,

Bernard
      
 besides that this is really unclear and difficult to find, we may want
    
to 
  
consider a different default for unicast mode. It is always better to
    
not
  
let 
people run into forseeable problems.

Cheers
Martin

    
On Wed,  Nov 17, 2010 at 1:14 PM, Louis Coilliot
<[email protected]>
wrote:
      
Hello, this behaviour is reported from time to time with unicast  :)

Use:
send_metadata_interval = 600

 (600, for example)

on the gmond.conf for your  nodes.

The metrics should get back after a  while.

Louis

        
    
------------------------------------------------------------------------------
  
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general
    

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general
  


This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to