Re: [Ganglia-general] Big clusters

Douglas Nordwall Thu, 08 Nov 2007 06:45:41 -0800

thanks for all the input. Perhaps unsurprisingly, we are alreadylooking at most of this for dealing with a large cluster, but I'mhappy to see that we're not the biggest :)

The RRD issues is good to know. We have a _very_ large disk arraythat we can write too, and I already ordered the collection serverwith 32G of ram. So, that should easily be bypassed. We're alreadygoing to break it up into chunks.

sadly, I don't think anyone has been able to answer my question aboutjitter. It's not a simple overhead issue... you could have somethingthat is 1ms worth of overhead on each node per minute. But if that1ms of overhead prevents the calculation from completion, that 1msimpacts all of the nodes. Then another one hits for 1ms 1ms later,again slowing down the calculation.

the question could be answered if someone had tried their performancewith all the gmond's off vs on on a latency sensitive application.some large gaussian job would do it.


On Nov 7, 2007, at 6:28 PM, richard grevis wrote:

Douglas,

what Matthias said is good.

At one stage we had a grid of 6,000 servers in maybe 50+ clusters with
5-10 second polling (!!!).\

Here are my experiences and some tips, some you will already know:
- The overhead of the gmond agent is very low on the monitored hosts,
both for CPU and network I/O. Not storing any local data is aGood Thing.
- Network overhead from UDP data is really, really low. In our case we
unicast the UDP to headnodes. Headnode CPU load was still reallysmall.
- Your first (and also biggest) bottleneck is calling RRDupdate andwritingRRD data to the filesystem. Many posts talk of this. We used SANfor theRRD files. Others made a tmpfs with rsync periodic backup. stracegmetad
  and you will see what I mean.
- gmetad spawns 1 thread per data_source as best I see, and eachthread doesthe TCP/XML data retrieval and then the RRD updates. Thisaffected us because
  of the 5-10 second polling of data sources.
- Personally I like 10 second polling, but it depends on yourtypical job
  durations.

Tips?
- Make a grid, chopping your cluster up. Helps on the display sidetoo!
- Integer values returned from gmond still give rise to RRD filesthat are
  updated at the poll rate, even if they are constant (e.g. cpu clock
  speed). Remove ones you don't need or morph them into string values.
- Gaps in graphed data? For us it was the inability of each threaddoingall it had too in within the polling interval window. The gangliaserver
  itself did not run out of overall cpu, in fact it is quite low.
- We also got the occassional gap exactly on the hour. Matt Toypostulatedthat this was the moment that RRD had to update its aggregatedvalues.
- Make gmetric scripts on the ganglia server that give you I/O wait,
disk service time etc. Spikes in I/O wait correlated with gapsfor us.
  umm. Mostly.

regards,
Richard G


--



Doug Nordwall
Unix Administrator
EMSL Computer and Network Support
Unclassified Computer Security
Phone: (509)372-6776; Fax: (509)376-0420

The best book on programming for the layman is "Alice in Wonderland";but that's because it's the best book on anything for the layman.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Big clusters

Reply via email to