It might be worth checking the disk IO... Disk IO is the main constraint on
monitoring clusters/grids actually.
On Dec 12, 2007 5:42 PM, Markus Reusch <[EMAIL PROTECTED]> wrote:
> Hi list(eners),
>
> we are using Ganglia for about 20-30 AIX-Systems. All data is collected
> on 2 LPARs which run gmetad, rrdtool and the Apache, so it's accessible
> for the users with the supplied PHP web frontend.
> Our rrd-database has currently a size of about 400 MB and consists of
> about 4800 files. I have no idea if this size or number of files is a
> problem at all.
> Every day, several users connect to the web frontend and open those
> graphs, which results in some rrdtool calls, exhausting all the
> available CPU time of the machine. No matter if it's just 4 rrdtool
> processes or 20 of them. The machine is already at it's limit when
> working with 4 rrdtool calls. When about 20 of them are started, a 2
> digit number of kernel threads is being queued...
> When all this occurs, you are not able to type properly in your ssh
> session and even some shell scripts being called by cron are having
> problems and give delayed output. This is no peak behaviour - it's
> constantly like this.
>
> The LPAR has AIX 5.3 with 1 * 1,7 MHz CPU and 2 GB RAM. VMM is tuned and
> machine is usually not paging at all. Not a strong machine but we
> thought sufficient for Ganglia + rrdtool + Apache.
>
> Other applications on the machine are not noticable, as you can see
> (topas):
> Name PID CPU% PgSp Owner
> rrdtool 458924 26.0 0.5 nobody
> rrdtool 966772 25.4 0.4 nobody
> rrdtool 1302770 23.0 3.6 root
> rrdtool 1183858 22.4 0.4 nobody
> gmetad 1110100 8.2 5.3 nobody
> topas 1159188 0.6 1.7 root
> gmond 352448 0.1 1.7 nobody
> java 848106 0.0 16.2 root
> sched 12294 0.0 0.4 root
> clstrmgr 405650 0.0 22.5 root
>
>
>
> A snippet from vmstat to see yourself (currently 4 rrdtools are running):
> kthr memory page faults cpu
> ----- ----------- ------------------------ ------------ -----------
> r b avm fre re pi po fr sr cy in sy cs us sy id wa
> 5 1 304078 4075 0 0 0 0 0 0 285 22896 1079 90 10 0 0
> 6 1 303783 4373 0 0 0 0 0 0 249 2038 787 98 2 0 0
> 6 0 304004 4145 0 0 0 0 0 0 239 17285 1001 92 8 0 0
>
>
> Here is a sample process of rrdtool taken with ps:
> /usr/bin/rrdtool graph --start 1197456160 --end 1197459760 --width 300
> --height 75 --title weight - DEF:sum=/var/lib/ganglia/rrds/PL1650 Linie
> SRZ/<somehostname>/weight.rrd:sum:AVERAGE AREA:sum#0000ff:<somehostname>
> last hour (now -1.00)
>
> My questions:
> Is it normal behaviour/configuration, that so many rrdtools are started
> when people are doing some requests to the web frontend?
> Do you guys spend a "big" box just running Ganglia + rrdtool + Apache or
> maybe just a small Intel/Linux box and it runs smooth?
>
> Any suggestions/hints are welcome, thank you in forward.
>
> Greetings
> Markus
>
> -------------------------------------------------------------------------
> SF.Net email is sponsored by:
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
--
~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=
Regards,
Aroop
~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general