i'd also mention that the udp buffer issue is particularly relevant if you have a lot of custom metrics coming from gmetric via cron... many hosts with aligned clocks sending at the exact same time.
one other piece i've added to generate this many metrics is a patch to gmetric which takes a simplified csv as input, as opposed to forking gmetric for every metric you want to send. (i don't think there are ganglia/gmetrics bindings for perl that would allow me to send bulk metrics natively from my go-to language of choice). sending bulk metrics this way probably also pushes the udp buffer limit. if anyone's interested i'll post the gmetric_csv patch. -scott On Wed, 17 Feb 2010, Scott Dworkis wrote: > i'm new to the community and not sure where to share this info, so i'll fling > it at the mailing list for now: > > i've been told 150k metrics is a lot. when researching performance and > dynamics, you never know where you might find interesting data, so my > approach is to collect as much as possible. i'm thinking i may double this > in the next few months for my ~360 host environent. > > the keys so far (besides having a 64 bit gmetad host with 32G of memory) have > been in sysctl.conf: > > > #svd increase udp buffer size to 100M for ganglia > net.core.rmem_max = 104857600 > net.core.rmem_default = 104857600 > > #svd tune fs cache for ganglia > vm.dirty_ratio = 100 > vm.dirty_background_ratio = 100 > vm.dirty_expire_centisecs = 720000 > > > a clue that you are bumping up against udp buffer limits is if you see a lot > of dropouts in metrics graphs across the board, and see a near constant > increase in: > > netstat -s | grep RcvbufErrors > > a clue that you are bumping into io cache is also a lot of dropouts in > metrics graphs, a lot of cpu io-wait time, and a consistent showing of > pdflush ranked high in bin/top. i suspect there are some risks with this > cache tuning, so use with caution. > > > the ganglia ui is probably not quite consolidated enough to visualize this > much data... for this i'm using drraw, which works pretty great but is > showing signs of scaling limits also... i'm planning to build a custom ui and > hoping it may be clean enough one day to contribute to the community. > > -scott > ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

