Ryan Sweet wrote:
On Tue, 24 Sep 2002, Federico Sacerdoti wrote:

Just by checking the number of disk interrupts we knew disk I/O was a problem, not to mention the inconsistant-looking graphs. When we put the


At the moment disk I/O isn't the problem, though I see how it could be once the rest of the systems get added. What method are you using for backing up the rrd's from tmpfs? rsync?

Uhhhhh yeah, rsync, that's the ticket...

[Just a cron script that runs every 15 minutes - losing 15 minutes' worth of data isn't going to freak anybody out over here, since we aren't using this for accounting purposes, just general health. After we reach full deployment I'll try tweaking the frequency of updates... ]

I really hope you aren't mixing Linux, FreeBSD and IRIX nodes *WITHIN* the same cluster.


That's _precisely_ what I'm doing.

You know, it's a good thing we aren't developing open-source proton packs...

The compute clusters are linux nodes, FreeBSD gateways.  Then the network
where I'm having trouble is the workstation network for the engineers,
which is a grab bag of 32bit, 64bit, IRIX/Linux/*BSD (one of the things I
want to help with asap is getting OpenBSD to build).

OpenBSD doesn't build, eh? How odd... I guess *BSD don't have as much in common as I thought.

I don't quite understand why this is (or needs be) a problem.  Shouldn't
the gmonds just hash and multicast all the metrics they receive,
regardless of whether it is a metric that its own host is capable of
storing.  It seemed to work this way in principle, with 2.4.1.  I have a
set of custom metrics (see my topusers.pl in the gmetric scripts) that are
per users, and thus by nature not on each machine.  These for the most
part work great... it is a really good way to see usage patterns across
the network and to pin resource usagee on the users responsible in a graph
that the managers can understand.  I used to use nasty hackish perl
scripts to create graphs from sar reports, which were never as accurate
anyway.  I much prefer ganglia in this regard.

Custom user metrics are actually given a metric hash value of 0 and are stored in a different manner. And don't blame me, it was like this when I got here. I refer you to the SF archives of this list for my numerous whiny e-mails about metric handling.

And Ganglia has worked that way since at least 2.3.x, which is about when I started taking an interest in it. It doesn't "need" to be that way, that's just the way it is... for now.

Man, I wish I could use something that simple to track jobs here. Unfortunately, to quote the great Keanu Reeves, "It's not that simple. It's complicated."




Reply via email to