i'm new to the community and not sure where to share this info, so i'll 
fling it at the mailing list for now:

i've been told 150k metrics is a lot.  when researching performance and 
dynamics, you never know where you might find interesting data, so my 
approach is to collect as much as possible.  i'm thinking i may double 
this in the next few months for my ~360 host environent.

the keys so far (besides having a 64 bit gmetad host with 32G of memory) 
have been in sysctl.conf:


#svd increase udp buffer size to 100M for ganglia
net.core.rmem_max = 104857600
net.core.rmem_default = 104857600

#svd tune fs cache for ganglia
vm.dirty_ratio = 100
vm.dirty_background_ratio = 100
vm.dirty_expire_centisecs = 720000


a clue that you are bumping up against udp buffer limits is if you see a 
lot of dropouts in metrics graphs across the board, and see a near 
constant increase in:

netstat -s | grep RcvbufErrors

a clue that you are bumping into io cache is also a lot of dropouts in 
metrics graphs, a lot of cpu io-wait time, and a consistent showing of 
pdflush ranked high in bin/top.  i suspect there are some risks with this 
cache tuning, so use with caution.


the ganglia ui is probably not quite consolidated enough to visualize this 
much data... for this i'm using drraw, which works pretty great but is 
showing signs of scaling limits also... i'm planning to build a custom ui 
and hoping it may be clean enough one day to contribute to the community.

-scott

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to