FWIW, I find gmetad utterly unusable in my configuration without running out of tmpfs. I don't believe it is specifically rrdtool, since I have a cacti instance polling several hundred devices for hundreds of thousands of OID's every five minutes (near-continual RRD write/update period)... and that has very little impact on IOwait or disk contention, such that I still run that directly off a local array.

gmetad on the other hand will completely crush any system I try to run atainst the _fastest_ local disk I throw it at. This with a client base of (average) 1000 nodes/5-8 clusters. When run out of tmpfs, I maintain a load of .01 and a network BW utilization of ~250KB/sec... when run vs local fast disk and an external write journal, this will flatten a quad-proc Opteron to the point of SSH response being _really_ slow, without showing more than a few MB/sec R/W vs. the disk... just seems like very inefficient IO patterns.

I haven't even put as much effort into looking at the problem as Gilad, as the tmpfs fix works fine... but IMO I think it probably has something to do with the way rrdtool is being called, not just its use vs. that many rrd's.

/eli

Jason A. Smith wrote:
Hi Gilad,

I thought I remember a sort of mini HOWTO or FAQ that existed on the old
ganglia web page which gave suggestions on how to setup ganglia, but I
can't find it now.

Anyway, I think ganglia's heavy IO requirements (mostly from rrdtool)
are fairly well known to long time users, and each has probably come up
with their own way around it.  Here, we are using a diskless database
directory for ganglia's rrd area, by using Linux's tmpfs:

/etc/fstab contains this line:
none  /var/lib/ganglia/rrds  tmpfs  size=1024M,mode=755,uid=nobody,gid=nobody  
0 0

The uid & gid options should match your gmetad.conf's setuid setting.

Then we backup the database directory using tar every night just to
prevent complete data loss in case our ganglia server crashes.

~Jason


On Fri, 2006-04-14 at 13:06 -0700, Gilad Raphaelli wrote:
I'm actually seeing 100% disk busy under both rhel4 and freebsd 4.11
with just 98 nodes in 13 clusters.   My goal is to get gmetad running
on freebsd, rhel4 was just for comparision's sake.  A ktrace reveals
100s of failed mkdirs during every writing period - traceable to
rrd_helpers.c.  There don't seem to be any other significant events.
When the disk hits 100% iowait the system is unusable.

I was under the impression that a relatively low powered system could
handle something like this configuration - perhaps that is the issue?
The box is a PIII 800 with 1.5 GB mem - the rrds are stored on a
dedicated 70 GB ide disk.

Any insight would be appreciated.  I'm hanging out in #ganglia on
freenode if anyone wants to chat.

Thank you,

Gil

----- Original Message ----
From: Bernard Li <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; ganglia-
[EMAIL PROTECTED]
Sent: Thursday, April 13, 2006 11:19:50 PM
Subject: [Ganglia-developers] RE: [Ganglia-general] New (final?)
tarball for ganglia-3.0.3

Hi Martin:
Finally had the time to test it, here's the text in the webpage now:

Gmetad Web Frontend version 3.0.3.200604132304 Check for Updates.
Gmetad Web Backend (gmetad) version 3.0.3.200604102000 Check for
Updates.
Looks like it's fixed. BTW, I tested Ganglia on Fedora Core 5 x86 and it is working fine. Did anybody else test 3.0.3? Somebody on IRC mentioned that he was
having issues with gmetad using up 99% CPU with a large number of
clients (50+).
Cheers, Bernard


______________________________________________________________________
From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
Sent: Tue 11/04/2006 11:38
To: Bernard Li; [EMAIL PROTECTED]; ganglia-
[EMAIL PROTECTED]
Subject: RE: [Ganglia-general] New (final?) tarball for ganglia-3.0.3



Bernard,

 could you please test the following patch in "web" to solve this
really really big problem :-) You need to run "./configure" to
recreate
"web/version.php".

$diff -u -r1.9 ganglia.php
--- ganglia.php 25 Mar 2006 01:53:57 -0000      1.9
+++ ganglia.php 11 Apr 2006 18:34:31 -0000
@@ -33,7 +33,8 @@
 $version = array();

 # The web frontend version, from conf.php.
-$version["webfrontend"] =
"$majorversion.$minorversion.$microversion";
+#$version["webfrontend"] =
"$majorversion.$minorversion.$microversion";
+$version["webfrontend"] = "$ganglia_version";

 # The name of our local grid.
 $self = " ";


$diff -u -r1.1 version.php.in
--- version.php.in      10 Dec 2004 21:34:04 -0000      1.1
+++ version.php.in      11 Apr 2006 18:34:50 -0000
@@ -5,7 +5,7 @@
 $minorversion = @GANGLIA_MINOR_VERSION@;
 $microversion = @GANGLIA_MICRO_VERSION@;

-$ganglia_version =
"@[EMAIL PROTECTED]@[EMAIL PROTECTED]@GANGLIA_MICRO_VERSION@";
+$ganglia_version = "@GANGLIA_VERSION@";
 $ganglia_release_name    = "@GANGLIA_RELEASE_NAME@";

 ?>


--- Bernard Li <[EMAIL PROTECTED]> wrote:

Just tested building and running on Fedora Core 4 x86, everything
checks out (minimal installation test) - did notice this minor issue
though:

Gmetad Web Frontend version 3.0.3 Check for Updates.
Gmetad Web Backend (gmetad) version 3.0.3.200604102000 Check for
Updates.

Notice the versions are different between webfrontend and gmetad - I
guess they use difference sources for the version string?

Chris, are you still planning to help us test with your hardware?

Thanks,

Bernard

P.S. If anybody wants the RPMs, please ping me.

________________________________

From: [EMAIL PROTECTED] on behalf of
Martin
Knoblauch
Sent: Sat 08/04/2006 00:31
To: ganglia general; [email protected]
Subject: [Ganglia-general] New (final?) tarball for ganglia-3.0.3



Hi,

 as promised, I have created a new pre-3.0.3 tarball. It can be
downloaded from:

http://www.knobisoft.de/ganglia/ganglia-3.0.3.200604080900.tar.gz

 Due to the release plans for OSCAR5, this could be the last snaphot
before a release next week.

 Especially the following problems are supposed to be solved:

- truncated XML
- bogus "old protocol" messages in dead-host detection
- gmetad will not stop updating RRDs after a previous failure
- apr-0.9.7 is now officially in CVS
- minor fixes to the webfrontend
- more minor stuff -> See the ChangeLog

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the live
webcast
and join the prime developer group breaking into this new coding
territory!

http://sel.as-us.falkag.net/sel?
cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Ganglia-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/ganglia-general




------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de





Reply via email to