jason-

disk i/o in gmetad is a big problem.  it's one of the problems that is 
being addressed in ganglia3 (as you mention).  here are some of the ways 
we are going to reduce disk i/o.

gmetad2 currently polls all the data from each data source every n secs.  
it creates round-robin databases that assume data will be coming in every
n secs.  it's ugly because it means that even metrics that are only
updated every hour (for example) remotely are being written to disk
locally every n secs.  (to explain why i wrote it that way.. i didn't have
a library that could quickly save XML data from multiple sources into a
common hierarchical data structure for easy manipulation... in a sense a
true XMLdb. the DOM libraries out there were not portable or fast enough
and still aren't.  to get around that limitation i immediately parsed all
XML, saved the data to RRDs and then saved the *raw* XML for output on
8651.. i know... excuses excuses :)).

i have written a basic XMLdb that can quickly collect/merge xml data from
many data sources (and allow for quick summarizing and filtering).  this
will empower us to make gmetad3 a lot more efficient and intelligent.

gmetad3 will poll data from multiple data sources just like gmetad2 but 
with some huge differences.  gmetad3 will immediately write the incoming 
data to its hierarchical data structure.  this will allow use to create a 
custom RRD format for each individual metric.  if a metric is only updated 
every hour then the RRDb will be setup to expect data once an hour.

gmetad3 will also be dealing with interactive data sources.  currently 
gmond2/gmetad2 are non-interactive and simply spit out all or nothing.  
having interactive data sources means that gmetad3 will be able to send a 
request for only "new" data.  say gmetad3 polls every minute it would pass 
60 to the remote source which would return all data that is <=60 seconds 
old.  another possibility is persisent connections.

gmetad3 will also use a delegation model instead of an aggregation model 
which will allow RRDb to be distributed instead of so centralized.

if you have any questions, suggestions or would like to be a part of the
coding process please let us know.
-- 
matt

Today, Jason A. Smith wrote forth saying...

> Since we have been slowly increasing the number of clusters and hosts
> that we are monitoring with ganglia, I have been watching closely how
> the gmetad host is handling the increased load and experimenting with a
> few alternatives for locations to store the rrds.
> 
> At first it was just on a partition made up of a pair of raided disks
> which obviously didn't scale very far.  Then I tried moving to a
> filesystem image mounted via the loopback device.  I think this helped
> to aggregate the thousands of disk accesses into just updating a single
> file, but as the size grew to a few hundred megabytes total (several
> hundred nodes), the disk I/O started to cause too much of a load on the
> gmetad node.  One advantage of this method is it is fairly easy to setup
> and the rrds are still being written to a physical disk so you don't
> lose any data on a reboot.
> 
> Then I tried experimenting with ramfs, since it is much easier to setup
> than a ramdisk, but I had a few problems with it, that I suspect might
> be bugs in the Linux kernel.  Since now the databases are stored only in
> RAM, I had a cronjob that would run every hour to backup the rrds
> directory.  Occasionally a process (either gmetad or tar) would go into
> an uninterruptable sleep state which would lock up the ramfs partition. 
> I would be forced to reboot in order to continue collecting data.  Also,
> it appeared that there was an almost 50% overhead in using ramfs (my
> 225MB rrds directory would consume about 337MB of RAM.
> 
> Then I finally settled on a ramdisk, although it takes a little more
> effort to setup and use.  The performance appears to be about the same
> as ramfs, without the lockups and 50% overhead, which really helps to
> alleviate the disk I/O load on the gmetad node.
> 
> So, how do other people handle their large database directories?  Is
> everyone using a ramdisk or has anyone used ramfs successfully?  How
> different will things be in the upcoming ganglia3 release?  Will the
> rrds be basically the same as they are now or will there be major
> changes in that part of ganglia also?
> 
> ~Jason
> 
> 
> 


Reply via email to