FYI,
I finally got through with enough other work to take a fresh look at
this problem (gmetad leaving gaps in graphs) and it was due to a bug in
one of my gmetric scripts which was putting a / in the metric name. when
I corrected that, the gaps went away.
Matt,
attached are two updated gmetric scripts that are slightly different than
the old verisons on the ganglia gmetric repository.
-ryan
On Tue, 24 Sep 2002, Ryan Sweet wrote:
>
> Sorry if this mail is more appropriate to the users list. I'll carry it
> over there if that turns out to be the case. As it is, I'm trying to use
> the vast improvement that is 2.5.0 to get some time from management to
> work on ganglia.
>
> I've got several small (<32 nodes) clusters of Linux systems, with freebsd
> or Linux file servers that are working great with gmond.
>
> I'm running gmetad on a Linux 2.4.19 system (RH 7.1+updates). It sees
> the clusters just fine, and the web front end makes great graphs of them,
> etc....
>
> I'm also using gmond to monitor our workstation network (mix of IRIX,
> FreeBSD, Linux), with the same gmetad collecting the data; herein lies the
> problem.
>
> With the old gmond (2.4.1) things mostly worked, though we often
> had IRIX machines where gmond would just silently segfault and never be
> heard from. We also had a problem with machines (also mostly the IRIX)
> being marked as down from time to time when they (and their gmond) were
> actually fine, nevertheless, it was usable, and mostly consistent.
>
> With 2.5.0, gmond is much more stable, and it has stopped marking live
> hosts as dead, however on the workstation network (which happens to be the
> same network as the gmetad server), the web frontend is showing graphs
> that have large gaps in them. The values reported for "now" always
> appear to be correct, but the values are graphed incorrectly.
>
> For an example (not live, just a dump to html), see
> http://wwwx.atos-group.nl/admn/gmetad_ex/gmetad.html
>
> This particular graph is for an Linux system, but it is on the same
> multi-cast channel as the IRIX systems....
>
> So where should I begin to look? I suspect that it is actually a problem
> with gmond, most likely on the IRIX systems, since gmetad and the web
> front end are working great on the clusters.
>
> regards,
> -Ryan
>
>
--
Ryan Sweet <[EMAIL PROTECTED]>
Atos Origin Engineering Services
http://www.aoes.nl
#!/usr/bin/perl
#
# a simple script to find the %cpu used by the top users
#
#
my $debug;
if ( $ARGV[0] eq "-d" ) { $debug=1; }
my $gmetric="gmetric";
my @ps=`ps aux| grep -v USER`; # RS: get ps aux output and skip the first line
my $users;
# RS: iterate over each line of the ps output
foreach my $line (@ps)
{
# RS: split the line on whitespace, assigning vars
my
($user,$pid,$cpu,$mem,$vsz,$rss,$tty,$stat,$start,$time,$command,@args) =
split(/\s+/, $line);
# RS: populate the hash %users with references to the cumulative
cpu,memz,time vars
$users->{$user}{cpu}+=$cpu;
$users->{$user}{mem}+=$mem;
$users->{$user}{vsz}+=$vsz;
my ($min,$sec)=split(/:/,$time);
$sec+=($min*60);
$users->{$user}{time}+=$time;
$users->{$user}{procs}+=1; # total number of procs per user
}
# RS: if debug, print the data structure (also makes a nice report)
if ($debug)
{
use Data::Dumper;
print Dumper($users);
}
# RS: for each user that was found, send the stats to gmond
foreach my $user (keys %$users)
{
# cpu total
system("gmetric --name=cpu_percent_$user --value=$users->{$user}{cpu}
--type=float --units=\%cpu");
# mem total
system("gmetric --name=mem_percent_$user --value=$users->{$user}{mem}
--type=float --units=\%mem");
# vsz total
system("gmetric --name=mem_vsz_kb_$user --value=$users->{$user}{vsz}
--type=float --units=kilobytes");
# cputime total
system("gmetric --name=cpu_total_time_sec_$user
--value=$users->{$user}{time} --type=float --units=seconds");
# processes total
system("gmetric --name=procs_total_$user --value=$users->{$user}{procs}
--type=float --units=processes");
}
#!/usr/bin/perl
# contributed by ryan sweet <[EMAIL PROTECTED]>
# v0.2
my $gmetric="gmetric";
my @df = `df -kl | grep -v "Filesystem"`; # RS: get info from df, leave out
first line
my $calcpercentused;
foreach (@df) # RS: for each line of df output
{
my @line = split(/\s+/, $_); # RS: split the line on whitespace
my @reversed = reverse @line; # RS: reverse the order of @line - this
is because IRIX df outputs different items than linux
my $size = $reversed[4]; # RS: the filesystem size is the fifth element
in the reversed list
my $used = $reversed[3];
# RS: calculated percent used (df gets it wrong sometimes) is
(100(used))/size
$used = $used * 100;
$calcpercentused = int($used/$size);
my $fsname=$line[5]; # RS: get the mount point
$fsname =~ s/\//_/g; # RS: replace / with _
if ($fsname eq "_") { $fsname="_root"; }
# RS: send the data to gmond using gmetric
system("gmetric --name=disk_percent_used$fsname
--value=$calcpercentused --type=uint8 --units=\percent_free");
}