One of end users noticed this issue in the beginning of March. They haven't used the matrices multiplication issue before with the foss/2018b modules.
If you want to start collecting all programs/scripts/benchmarks in a central repository, you can start using the following scripts in my github repo: https://github.com/eylenth/Openblas_matrix_issue If an "module swap" to the newer OpenBLAS version will be implemented in EasyBuild/3.9.1. Then the OpenBLAS versions in R_HOME/lib64/R/etc/ldpaths of R modules ,which are compiled with the 'broken' OpenBLAS, needs to be modified. Regards Thomas Eylenbosch -----Original Message----- From: [email protected] <[email protected]> On Behalf Of Åke Sandgren Sent: woensdag 8 mei 2019 20:05 To: [email protected] Subject: Re: [easybuild] Openblas(foss) matrix issue On 5/8/19 7:56 PM, Kenneth Hoste wrote: > Thank you for reporting back on this Thomas! > > It's good to hear that the issue can be resolved by using a newer > version of OpenBLAS, but it's also frustrating... > > This is clearly a bug in OpenBLAS that could have been prevented. > I haven't studied this issue in detail myself yet, but I have seen > comments pass by from OpenBLAS maintainers who say they don't have > Skylake hardware to test on. > That makes me wonder how well the rest of OpenBLAS is tested, which is > a bit infuriating for a library that important. > > On the EasyBuild side, I think we have a couple of options for mitigation: > > 1) Add eaysconfigs for the latest version of OpenBLAS to the next > EasyBuild release (v3.9.1) which can be used to swap out the OpenBLAS > included in recent foss toolchains. > I suspect simply doing a "module swap" to the newer OpenBLAS version > is sufficient in most cases (if OpenBLAS was not statically linked, > and if RPATH is not used). > > 2) Modify the toolchain definition of foss/2018b (and foss/2019a?) to > use the newer OpenBLAS version. > I'm not sure if this is too drastic or not, but it would be up to each > site to decide whether or not they want to update their already foss > modules to pick on this or not. > > 3) Collect test programs/scripts/benchmarks in a central repository > (easybuild-testing?), so we can assess the stability of future > OpenBLAS versions that we consider for inclusion in the 'foss' toolchains. > > You could state that this isn't our 'job', but if the OpenBLAS > maintainers are not capable of properly testing their releases on > recent hardware, then I guess it's our duty to try and catch problems > like this ourselves before they blow up in our faces weeks (or months) later. > > Anyone who would be up for helping out with this? > For now we should definitely focus on covering this OpenBLAS issue > well, but I can see this thing growing out as another central repo > where we pool together efforts done on testing/benchmarking on top of > modules installed with EasyBuild... > > > I'm a bit surprised that these problems didn't arise earlier... > foss/2018b has been defined a fairly long time ago (early July 2018), > and this toolchain has been picked up quite long (based on incoming > contributions). > So why did these problems only start surfacing in recent weeks? Does > anyone have a plausible explanation? > Note that I'm genuinely wondering here, I'm not trying to insinuate > anything... One reason it haven't shown up earlier is that it mainly affects slightly larger matrices and maybe also only in certain circumstances. That's the feeling i got from the user here that had problems. small to medium sizes had no problem, only fairly large ones showed any problems... And we would definitely help in testing... -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: [email protected] Phone: +46 90 7866134 Fax: +46 90-580 14 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

