Re: [Ganglia-developers] 2.5.0 experiences

Steven Wagner Wed, 25 Sep 2002 10:48:41 -0700

Ryan Sweet wrote:

On Tue, 24 Sep 2002, Federico Sacerdoti wrote:
Just by checking the number of disk interrupts we knew disk I/O was aproblem, not to mention the inconsistant-looking graphs. When we put the
At the moment disk I/O isn't the problem, though I see how it could beonce the rest of the systems get added. What method are you using forbacking up the rrd's from tmpfs? rsync?


Uhhhhh yeah, rsync, that's the ticket...

[Just a cron script that runs every 15 minutes - losing 15 minutes' worthof data isn't going to freak anybody out over here, since we aren't usingthis for accounting purposes, just general health. After we reach fulldeployment I'll try tweaking the frequency of updates... ]

I really hope you aren't mixing Linux, FreeBSD and IRIX nodes *WITHIN*the same cluster.
That's _precisely_ what I'm doing.


You know, it's a good thing we aren't developing open-source proton packs...

The compute clusters are linux nodes, FreeBSD gateways.  Then the network
where I'm having trouble is the workstation network for the engineers,
which is a grab bag of 32bit, 64bit, IRIX/Linux/*BSD (one of the things I
want to help with asap is getting OpenBSD to build).

OpenBSD doesn't build, eh? How odd... I guess *BSD don't have as much incommon as I thought.

I don't quite understand why this is (or needs be) a problem.  Shouldn't
the gmonds just hash and multicast all the metrics they receive,
regardless of whether it is a metric that its own host is capable of
storing.  It seemed to work this way in principle, with 2.4.1.  I have a
set of custom metrics (see my topusers.pl in the gmetric scripts) that are
per users, and thus by nature not on each machine.  These for the most
part work great... it is a really good way to see usage patterns across
the network and to pin resource usagee on the users responsible in a graph
that the managers can understand.  I used to use nasty hackish perl
scripts to create graphs from sar reports, which were never as accurate
anyway.  I much prefer ganglia in this regard.

Custom user metrics are actually given a metric hash value of 0 and arestored in a different manner. And don't blame me, it was like this when Igot here. I refer you to the SF archives of this list for my numerouswhiny e-mails about metric handling.

And Ganglia has worked that way since at least 2.3.x, which is about when Istarted taking an interest in it. It doesn't "need" to be that way, that'sjust the way it is... for now.

Man, I wish I could use something that simple to track jobs here.Unfortunately, to quote the great Keanu Reeves, "It's not that simple.It's complicated."

Re: [Ganglia-developers] 2.5.0 experiences

Reply via email to