Re: [Ganglia-general] ganglia and performance on large (1000+)

richard grevis Thu, 08 Nov 2007 20:32:54 -0800

Doug,

OK, I gave the paper a careful read. That was good work I must say,
and now I see where you are coming from. Questions. What is the polling
interval you would use? Do you plan multicast?  Anyway -


Caveats:
- the authors are the ones that really know the internals.
- Our clusters are/were Monte Carlo, so no IPC inside the algorithms.
- Our clusters had fast polling intervals (5-10 seconds).

Gmond:
- single threaded for metric collection. metrics grouped into collection
  groups.
- Collection groups are polled. The sleep interval is for the shortest
  interval from now to the next group that needs waking.
- Send is decoupled from collect. Send happens after a longer timeout
  or a metric value exceeds a threshold (as you know).

So in theory, as "now" is not explicitly aligned, gmond sleeps and UDP
sends should be stochastic. Are they? Well, no, not really.
- using whole seconds and calling sleep() makes for really grainy gmond
  sleep intervals. gmonds with small sleep intervals will align on second
  boundaries and spurt UDP together.
- When a cluster starts computing, all nodes will see a cpu load spike
  exceeding the configured threshold at much the same time.
- On a cluster with gmond multicasting, it is obviously true that gmond
  computation spikes will align. They slurp together. Multicast - just say no.

It should be pretty easy to see the above behaviour by snoop/tcpdumping at a
headnode.

- I have observed unexplained delays every now and again in getting XML
  data via TCP from the headnode. headnodes being subnet local but not in
  cluster may be worth considering BTW.

gmetad:
- One thread per data_source, and some other threads for this and that.
- For 10 seconds polling I am fairly sure that the threads end up aligning
  themselves. Resonance as the paper said. I don't know why.

Summary?

Resonance and gmond load spike synchronicity make cause you some 
compute jitter on nodes. But if you want to observe load during
calculations, well you want to observe load. And gmond is pretty
lightweight for that task and I am unaware of anything "lighter".

gmetad level scaling problems are real but can be managed through SAN or
tmpfs use, grouping nodes to clusters then clusters to grids.

Aside: Fast ganglia polling requires hacks to remove statically defined
and long sleeps in several places (to add jitter to threads).

- richard





-- 

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] ganglia and performance on large (1000+)

Reply via email to