Hi list(eners),

we are using Ganglia for about 20-30 AIX-Systems. All data is collected 
on 2 LPARs which run gmetad, rrdtool and the Apache, so it's accessible 
for the users with the supplied PHP web frontend.
Our rrd-database has currently a size of about 400 MB and consists of 
about 4800 files. I have no idea if this size or number of files is a 
problem at all.
Every day, several users connect to the web frontend and open those 
graphs, which results in some rrdtool calls, exhausting all the 
available CPU time of the machine. No matter if it's just 4 rrdtool 
processes or 20 of them. The machine is already at it's limit when 
working with 4 rrdtool calls. When about 20 of them are started, a 2 
digit number of kernel threads is being queued...
When all this occurs, you are not able to type properly in your ssh 
session and even some shell scripts being called by cron are having 
problems and give delayed output. This is no peak behaviour - it's 
constantly like this.

The LPAR has AIX 5.3 with 1 * 1,7 MHz CPU and 2 GB RAM. VMM is tuned and 
machine is usually not paging at all. Not a strong machine but we 
thought sufficient for Ganglia + rrdtool + Apache.

Other applications on the machine are not noticable, as you can see (topas):
Name            PID  CPU%  PgSp Owner
rrdtool      458924  26.0   0.5 nobody
rrdtool      966772  25.4   0.4 nobody
rrdtool     1302770  23.0   3.6 root
rrdtool     1183858  22.4   0.4 nobody
gmetad      1110100   8.2   5.3 nobody
topas       1159188   0.6   1.7 root
gmond        352448   0.1   1.7 nobody
java         848106   0.0  16.2 root
sched         12294   0.0   0.4 root
clstrmgr     405650   0.0  22.5 root



A snippet from vmstat to see yourself (currently 4 rrdtools are running):
kthr    memory              page              faults        cpu
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 5  1 304078  4075   0   0   0   0    0   0 285 22896 1079 90 10  0  0
 6  1 303783  4373   0   0   0   0    0   0 249 2038 787 98  2  0  0
 6  0 304004  4145   0   0   0   0    0   0 239 17285 1001 92  8  0  0


Here is a sample process of rrdtool taken with ps:
/usr/bin/rrdtool graph --start 1197456160 --end 1197459760 --width 300 
--height 75 --title weight - DEF:sum=/var/lib/ganglia/rrds/PL1650 Linie 
SRZ/<somehostname>/weight.rrd:sum:AVERAGE AREA:sum#0000ff:<somehostname> 
last hour (now -1.00)

My questions:
Is it normal behaviour/configuration, that so many rrdtools are started 
when people are doing some requests to the web frontend?
Do you guys spend a "big" box just running Ganglia + rrdtool + Apache or 
maybe just a small Intel/Linux box and it runs smooth?

Any suggestions/hints are welcome, thank you in forward.

Greetings
Markus

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to