Ciao Riccardo,

On 16 Jan, 2013, at 18:26, Riccardo Murri wrote:
> Back in January last year, the sysadmins on the HPC cluster updated
> the BIOS on the compute nodes; after the upgrade, the performance of
> my integer matrix code dropped by some 30%.

This is indeed an interesting story, though easybuild can't help too much in 
this,
other than prove that it is not the code build paths that triggered the 
performance diff.
Which is already a great deal, I'd confess.

On a related note, this could also be caught by running automated benchmarks 
(*);
this is basically invaluable exactly when you enter and exit maintenance 
windows:
- running benchmarking codes may serve as a validity test against system 
changes!
In our case, we will eventually get a "top10" of codes that are 
relevant/indicative 
for our community and run them in a continuous mode, in a "best-effort" queue; 
like:
https://www.nersc.gov/users/job-information/hopper-benchmark-monitoring/
Of course EasyBuild will play a prominent role in this business because, 
part of what we want to test is not only the performance of applications per se
but also the produced binaries by fresher builds (yes, there are trade-offs 
here).

We basically want to give to users the whole spectrum of choice, 
between guaranteed stability and performance... and deliver to sysadmin team 
a robust way to detect annoying things like "BIOS-collateral-damage"...

(*) 1/3rd of my pet project HPCBIOS is about that kind of stuff.

Let's take this in private if people here are interested in this topic.
(catch me in FOSDEM 2013 about it btw :)

cheerio,
Fotis

Reply via email to