>>> On 1/7/2011 at 9:10 PM, in message
<aanlktikfk_hy2v_zvkb_pra6vxmeqnv3nw3iokhxx...@mail.gmail.com>, Jesse Becker
<haw...@gmail.com> wrote:
> On Fri, Jan 7, 2011 at 15:25, Bernard Li <bern...@vanhpc.org> wrote:
>> Hi all:
>>
>> Since the release of Ganglia 3.1, we have introduced the new
>> configuration option send_metadata_interval in gmond.conf.  This is
>> set to 0 by default and the user must set this to a sane number if
>> using unicast otherwise if gmonds are restarted, hosts may appear to
>> be offline (this is documented in the release notes).  A bug has
>> already been filed:
>>
>> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=242 
>>
>> We recently have a lot of users having this issue and Vladimir
>> recommend that we just set a sane number as the default and be done
>> with it, since we end up spending a lot of time on IRC/mailing-list to
>> solve the same problem over and over again.
>>
>> Since there have been some commits to the 3.1 branch since tagging
>> 3.1.7, I propose we just copy 3.1.7 tag, update the send_meta_data
>> interval in the configuration file and release that as 3.1.8.
>>
>> This is not the normal procedure for making a release, so I'd like to
>> get some feedback from other developers.
>>
>> BTW I am thinking of setting send_metadata_interval to 30 seconds.
>> Also, does anybody know if this setting affects multicast setups in
>> any way?
> 
> I think that it's fine to set this to a non-zero value, but I wonder
> if 30 seconds is too high.  I did a quick set of checking on the
> actual packets that are sent--and specifically the metadata packets.
> I haven't been able to really delve into the code to figure exactly
> what's going on (this part of the code is't terribly transparent to
> me), but I *think* that they are really large--on the order of several
> KB when fully assembled, as compared to less than 100-120 bytes for a
> typical metric packet .  I think that size will increase with the
> number of metrics stored, since each one must be described in full XML
> each time.
> 
> The reason for the large size is that an entire XML description of the
> metrics appears to be sent each time.  Metadata packets also appear to
> go over TCP, not UDP.
> 
> My testing was pretty simple:
> 1) setup a gmond (from SVN, well after 3.1 came out) in unicast mode.
> 2) set 'send_metadata_interfaval' to 1
> 3) disable all modules, except for 'mod_core'
> 4) remove all collection groups.
> 5) start gmond, and run tcpdump.
> 
> On a large cluster, with lots of metrics per host, I can see problems
> if the metadata packets are sent too frequently.  I have hosts that
> send well over 300 metrics (lots of CPU cores makes for lots of
> metrics...).  Each of these need to be described in the metadata
> packets.
> 
> So I think that setting a non-zero default is fine.  But think that
> something like 300 or 600 seconds would be preferable.
> 

The purpose of setting the send_metadata_interval to 0 by default was to avoid 
unnecessary traffic for our default configuration of multicast.  Setting the 
directive to anything other than 0 will cause each gmond to start sending all 
of its metric metadata on that interval.  If you are going to set it by 
default, IMO 30 seconds is too low.  The problem is that people only notice 
this in the first few minutes after restarting a gmond.  They expect metrics to 
start showing up immediately.  After the gmond node finally does send its 
metadata, rebroadcasting the metadata at any interval is just consuming 
unnecessary bandwidth on the network.  Especially in a multicast environment 
where it isn't needed at all.  Also consider that the more gmond nodes you have 
the more traffic you are going to but on the network where 99% of the time the 
extra traffic is totally unnecessary.

300 or 600 seconds is probably good enough for a default.  But no matter what 
the default is, users still have to understand what that directive is for and 
how to optimize it.  The value of send_metadata_interval will probably be 
different for every installation when you take into consideration the number of 
nodes, the number of metrics and any other network related variables.

Brad


------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web.   Learn how to 
best implement a security strategy that keeps consumers' information secure 
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl 
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to