i'm new to the community and not sure where to share this info, so i'll fling it at the mailing list for now:
i've been told 150k metrics is a lot. when researching performance and dynamics, you never know where you might find interesting data, so my approach is to collect as much as possible. i'm thinking i may double this in the next few months for my ~360 host environent. the keys so far (besides having a 64 bit gmetad host with 32G of memory) have been in sysctl.conf: #svd increase udp buffer size to 100M for ganglia net.core.rmem_max = 104857600 net.core.rmem_default = 104857600 #svd tune fs cache for ganglia vm.dirty_ratio = 100 vm.dirty_background_ratio = 100 vm.dirty_expire_centisecs = 720000 a clue that you are bumping up against udp buffer limits is if you see a lot of dropouts in metrics graphs across the board, and see a near constant increase in: netstat -s | grep RcvbufErrors a clue that you are bumping into io cache is also a lot of dropouts in metrics graphs, a lot of cpu io-wait time, and a consistent showing of pdflush ranked high in bin/top. i suspect there are some risks with this cache tuning, so use with caution. the ganglia ui is probably not quite consolidated enough to visualize this much data... for this i'm using drraw, which works pretty great but is showing signs of scaling limits also... i'm planning to build a custom ui and hoping it may be clean enough one day to contribute to the community. -scott ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

