>>> On 1/10/2011 at 4:52 PM, in message
<aanlktinbmlmnbcti3q-sjuocmp=+igaggo0trj3gf...@mail.gmail.com>, Bernard Li
<bern...@vanhpc.org> wrote:
> Hi Brad:
> 
> Thanks for your reply.
> 
> On Mon, Jan 10, 2011 at 8:06 AM, Brad Nicholes <bnicho...@novell.com> wrote:
> 
>> The purpose of setting the send_metadata_interval to 0 by default was to 
> avoid unnecessary traffic for our default configuration of multicast.  
> Setting the directive to anything other than 0 will cause each gmond to start 
> sending all of its metric metadata on that interval.  If you are going to set 
> it by default, IMO 30 seconds is too low.  The problem is that people only 
> notice this in the first few minutes after restarting a gmond.  They expect 
> metrics to start showing up immediately.  After the gmond node finally does 
> send its metadata, rebroadcasting the metadata at any interval is just 
> consuming unnecessary bandwidth on the network.  Especially in a multicast 
> environment where it isn't needed at all.  Also consider that the more gmond 
> nodes you have the more traffic you are going to but on the network where 99% 
> of the time the extra traffic is totally unnecessary.
> 
> I have a perhaps naive question.  It sounds like
> send_metadata_interval is only relevant to unicast configuration, so
> why is multicast affected as well?  How difficult of a code change
> would it be if we make the send_metadata_interval directive to only
> affect unicast?
> 

We could add code to gmond to always disable resending metadata based on an 
interval.  But then that is what the default value of 0 was doing.  


> Also multicast is the default configuration due to historic reasons
> but not because it is more common.  It is however easier to set up if
> your environment supports it.  Is it time for us to evaluate whether
> we should switch to unicast as the default?  And if so how?  What is
> the actual spread between unicast and multicast users?  If it turns
> out that the majority of our (new) users are using unicast, should we
> spend more time/effort making it easier for them to use Ganglia?
> 

Actually I think this is a good idea.  In my experience, unicast seems to be 
more the norm rather than the exception now.  If we were to make unicast the 
default, then that would make the suggestion above more relevant.  We would 
probably want to put something in the code to automatically disable the send 
metadata for multicast.


>> 300 or 600 seconds is probably good enough for a default.  But no matter 
> what the default is, users still have to understand what that directive is 
> for and how to optimize it.  The value of send_metadata_interval will 
> probably be different for every installation when you take into consideration 
> the number of nodes, the number of metrics and any other network related 
> variables.
> 
> A couple more ideas came out of a brief brainstorming session on IRC
> between Vladimir, Jesse and myself:
> 
> 1) Collector gmond should request metadata from all gmonds when it has
> been freshly (re)started

This already happens in multicast mode.  Whenever a gmond node receives a 
metric packet for which it has no metadata, it automatically sends out a 
request on the channel for metadata.  The end result is that all gmond nodes 
are constantly resyncing themselves until all nodes in a cluster have a 
complete metadata picture.  However, the same can not be done for unicast 
because, by definition, there is no two-way communication.  In order to make 
the same functionality work for unicast, we would have to introduce a new 
listen port on every gmond that would accept commands and respond to whatever 
they are.  Doing that opens up a security risk that would have to be dealt with 
correctly. 

> 2) Add a configuration check for gmond so upon starting, if
> configuration is unicast-based, and send_metadata_interval is 0, warn
> the user to set it to a sane number

This would be a good idea no matter what else we do.  

> 3) Find a middle ground of default send_metadata_interval which does
> not hurt new users in HPC space wanting to use unicast
> 
> 2) and 3) are workarounds which could be implemented relatively
> quickly, 1) maybe not so much.

agreed

Brad

------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to