Re: [Ganglia-developers] quick question

Matt Massie Tue, 16 Mar 2004 15:28:52 -0800

On Tue, 2004-03-16 at 14:36, Jason A. Smith wrote:
> Hi Matt,
> 
> If you are going to do this then would you be able to make this
> optional?  The reason I ask is that we currently have a single gmetad
> server monitoring several clusters totaling over 1100 nodes, which comes
> to only about 370MB of space for all of the rrds.  Because of the amount
> of IO needed to update these, the rrds are stored entirely in RAM
> (tmpfs), which is possible because of their small size.  The change you
> are suggesting is to increase this almost 6 times which would require
> over 2GB of RAM just for the rrds.

just to be clear.  with rrdtool, increasing the size of the database
does not necessarily increase the amount of disk i/o.  currently, we
have multiple round-robin archives inside each round-robin database. 
i'm trying to do two things: 1.) reduce the number of archives as much
as possible and 2.) completely change the heartbeats and intervals to
require only value changes to touch disk.

1.) reduces the amount of reading and writing that rrdtool does for each
insert/update to the database.. because there are less archives to
update.

2.) is the most dramatic io helper.  currently, we have some polling
interval (say 15 seconds).  gmetad writes data to the rrds regardless of
whether the value changes or not.  "foo has 1 cpu.  foo still has 1
cpu.  foo has 1 cpu.  foo still has 1 cpu."  it's ugly.  i just wrote it
to work without building it to work well.

the new way will configure the RRDs to have HUGE heartbeats.  a
heartbeat is the maximum interval to not receive data until you mark it
as NaN or Unknown.  in short, assume the number hasn't changed unless i
tell you otherwise.  if a metric/host stops reporting, gmetad will
actively put NaN's into the databases instead of relying on passive
heartbeats.

i would like to make it configurable but .. that would be a task.  it
means people would never be able to change it .. if they did.. they
would have to convert every database to handle the new configuration.  i
don't think we want to support that.  we need to think more about this..
maybe it could be done.. but i'm not sure.

-matt

> 
> Maybe you can even make how the rrds are stored customizable where the
> gmetad and webfrontend would read some kind of config file allowing the
> user complete control over the rrd format.  If this is too difficult
> then something like a config option to specify full-sized or mini
> databases would be enough.
> 
> ~Jason
> 
> 
> On Tue, 2004-03-16 at 15:28, Matt Massie wrote:
> > what is the maximum size of an rrd that you would tolerate?  what is a
> > reasonable size?  it is currently 11948 bytes per metric and double that
> > for summary metrics.
> > 
> > that means that a 128 node cluster monitoring 30 metrics each would take
> > about 11948*128*30 + 11948*2*30 = 44.5 MBs.  tiny.
> > 
> > i'd like to expand the size of the round-robin databases to around 
> > 70620 bytes per metric.  that means that a 128 node cluster monitoring
> > 30 metrics each would take around
> > 70620 * 128 * 30 + 70620*2*30 = 262 MBs.  small.
> > 
> > it would allow hourly averages for a year.  it would give you the power
> > to ask.. what going on last week with more fine-grain accuracy.
> > 
> > keep in mind that the disk io is not going to go up.. it will drop
> > significantly given the new design.
> > 
> > -matt
-- 
Mobius strippers never show you their back side
PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E  F40B 242A 5984 ACBC 91D3'

signature.asc
Description: This is a digitally signed message part

Re: [Ganglia-developers] quick question

Reply via email to