Douglas Nordwall wrote:

> thanks for all the input. Perhaps unsurprisingly, we are already 
> looking at most of this for dealing with a large cluster, but I'm 
> happy to see that we're not the biggest :)
>
> The RRD issues is good to know. We have a _very_ large disk array that 
> we can write too, and I already ordered the collection server with 32G 
> of ram. So, that should easily be bypassed.  We're already going to 
> break it up into chunks.
>
> sadly, I don't think anyone has been able to answer my question about 
> jitter. It's not a simple overhead issue... you could have something 
> that is 1ms worth of overhead on each node per minute. But if that 1ms 
> of overhead prevents the calculation from completion, that 1ms impacts 
> all of the nodes. Then another one hits for 1ms 1ms later, again 
> slowing down the calculation.


These days, the term 'cluster' means different things to different 
people.  In the paper you quoted,  it means a tightly-coupled cluster 
(parallel, distributed memory, MPI/PVM/etc).  Many of us also use the 
term 'cluster' to describe grids -- completely separate systems managed 
with a batch system such as LSF.

The 'jitter' issue only exists in the former sort of cluster, since 
there is no expectation of syncronous execution in the latter sort of 
cluster.

We have some big clusters of the latter type -- with a few thousand 
cpus, and I can confirm that the advice given is all appropriate.  In 
our case, ganglia introduces no measureable change in throughput.  
However, I can't advise at all on the jitter issue!  I would suspect 
that it is likely to be a real problem...



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to