[Ganglia-developers] Follow-up: resolved: Re: 2.5.0 experiences

Ryan Sweet Wed, 06 Nov 2002 06:04:20 -0800

FYI,
I finally got through with enough other work to take a fresh look at 
this problem (gmetad leaving gaps in graphs) and it was due to a bug in 
one of my gmetric scripts which was putting a / in the metric name.  when 
I corrected that, the gaps went away.


Matt,
attached are two updated gmetric scripts that are slightly different than 
the old verisons on the ganglia gmetric repository. 

-ryan

On Tue, 24 Sep 2002, Ryan Sweet wrote:

> 
> Sorry if this mail is more appropriate to the users list.  I'll carry it 
> over there if that turns out to be the case.  As it is, I'm trying to use 
> the vast improvement that is 2.5.0 to get some time from management to 
> work on ganglia.
> 
> I've got several small (<32 nodes) clusters of Linux systems, with freebsd
> or Linux file servers that are working great with gmond.
> 
> I'm running gmetad on a Linux 2.4.19 system  (RH 7.1+updates).  It sees 
> the clusters just fine, and the web front end makes great graphs of them, 
> etc....
> 
> I'm also using gmond to monitor our workstation network (mix of IRIX,
> FreeBSD, Linux), with the same gmetad collecting the data; herein lies the
> problem.  
> 
> With the old gmond (2.4.1) things mostly worked, though we often 
> had IRIX machines where gmond would just silently segfault and never be 
> heard from.  We also had a problem with machines (also mostly the IRIX) 
> being marked as down from time to time when they (and their gmond) were 
> actually fine, nevertheless, it was usable, and mostly consistent.
> 
> With 2.5.0, gmond is much more stable, and it has stopped marking live 
> hosts as dead, however on the workstation network (which happens to be the 
> same network as the gmetad server), the web frontend is showing graphs 
> that have large gaps in them.  The values reported  for "now" always 
> appear to be correct, but the values are graphed incorrectly.
> 
> For an example (not live, just a dump to html), see 
> http://wwwx.atos-group.nl/admn/gmetad_ex/gmetad.html
> 
> This particular graph is for an Linux system, but it is on the same 
> multi-cast channel as the IRIX systems....
> 
> So where should I begin to look?  I suspect that it is actually a problem 
> with gmond, most likely on the IRIX systems, since gmetad and the web 
> front end are working great on the clusters.  
> 
> regards,
> -Ryan
> 
> 

-- 
Ryan Sweet <[EMAIL PROTECTED]>
Atos Origin Engineering Services
http://www.aoes.nl

#!/usr/bin/perl
#
# a simple script to find the %cpu used by the top users
#
#
my $debug;
if ( $ARGV[0] eq "-d" ) { $debug=1; }
my $gmetric="gmetric";

my @ps=`ps aux| grep -v USER`; # RS: get ps aux output and skip the first line
my $users; 

# RS: iterate over each line of the ps output
foreach my $line (@ps) 
{

        # RS: split the line on whitespace, assigning vars
        my 
($user,$pid,$cpu,$mem,$vsz,$rss,$tty,$stat,$start,$time,$command,@args) = 
split(/\s+/, $line);     

        # RS: populate the hash %users with references to the cumulative 
cpu,memz,time vars
        $users->{$user}{cpu}+=$cpu;
        $users->{$user}{mem}+=$mem;
        $users->{$user}{vsz}+=$vsz;

        my ($min,$sec)=split(/:/,$time);
        $sec+=($min*60);
        $users->{$user}{time}+=$time;
        $users->{$user}{procs}+=1; # total number of procs per user
        
}

# RS: if debug, print the data structure (also makes a nice report)
if ($debug)
{
        use Data::Dumper;
        print Dumper($users);
}

# RS: for each user that was found, send the stats to gmond
foreach my $user (keys %$users)
{
        # cpu total
        system("gmetric --name=cpu_percent_$user --value=$users->{$user}{cpu} 
--type=float --units=\%cpu");
        # mem total
        system("gmetric --name=mem_percent_$user --value=$users->{$user}{mem} 
--type=float --units=\%mem");
        # vsz total
        system("gmetric --name=mem_vsz_kb_$user --value=$users->{$user}{vsz} 
--type=float --units=kilobytes");

        # cputime total
        system("gmetric --name=cpu_total_time_sec_$user 
--value=$users->{$user}{time} --type=float --units=seconds");
        # processes total
        system("gmetric --name=procs_total_$user --value=$users->{$user}{procs} 
--type=float --units=processes");
                        

}

#!/usr/bin/perl
# contributed by ryan sweet <[EMAIL PROTECTED]>
# v0.2
my $gmetric="gmetric";
my @df = `df -kl | grep -v "Filesystem"`; # RS: get info from df, leave out 
first line

my $calcpercentused;
foreach (@df)   # RS: for each line of df output
{
        my @line = split(/\s+/, $_); # RS: split the line on whitespace
        my @reversed = reverse @line; # RS: reverse the order of @line - this 
is because IRIX df outputs different items than linux
        my $size = $reversed[4]; # RS: the filesystem size is the fifth element 
in the reversed list
        my $used = $reversed[3];
        # RS: calculated percent used (df gets it wrong sometimes) is 
(100(used))/size
        $used = $used * 100;
        $calcpercentused = int($used/$size);
        my $fsname=$line[5]; # RS: get the mount point
        $fsname =~ s/\//_/g; # RS: replace / with _
        if ($fsname eq "_") { $fsname="_root"; }
        # RS: send the data to gmond using gmetric
        system("gmetric --name=disk_percent_used$fsname 
--value=$calcpercentused --type=uint8 --units=\percent_free"); 
}

[Ganglia-developers] Follow-up: resolved: Re: 2.5.0 experiences

Reply via email to