Douglas Nordwall wrote: > thanks for all the input. Perhaps unsurprisingly, we are already > looking at most of this for dealing with a large cluster, but I'm > happy to see that we're not the biggest :) > > The RRD issues is good to know. We have a _very_ large disk array that > we can write too, and I already ordered the collection server with 32G > of ram. So, that should easily be bypassed. We're already going to > break it up into chunks. > > sadly, I don't think anyone has been able to answer my question about > jitter. It's not a simple overhead issue... you could have something > that is 1ms worth of overhead on each node per minute. But if that 1ms > of overhead prevents the calculation from completion, that 1ms impacts > all of the nodes. Then another one hits for 1ms 1ms later, again > slowing down the calculation.
These days, the term 'cluster' means different things to different people. In the paper you quoted, it means a tightly-coupled cluster (parallel, distributed memory, MPI/PVM/etc). Many of us also use the term 'cluster' to describe grids -- completely separate systems managed with a batch system such as LSF. The 'jitter' issue only exists in the former sort of cluster, since there is no expectation of syncronous execution in the latter sort of cluster. We have some big clusters of the latter type -- with a few thousand cpus, and I can confirm that the advice given is all appropriate. In our case, ganglia introduces no measureable change in throughput. However, I can't advise at all on the jitter issue! I would suspect that it is likely to be a real problem... ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

