One of end users noticed this issue in the beginning of March. They haven't 
used the matrices multiplication issue before with the foss/2018b modules.

If you want to start collecting all programs/scripts/benchmarks in a central 
repository, you can start using the following scripts in my github repo: 
https://github.com/eylenth/Openblas_matrix_issue

If an "module swap" to the newer OpenBLAS version will be implemented in 
EasyBuild/3.9.1. Then the OpenBLAS versions in  R_HOME/lib64/R/etc/ldpaths of R 
modules ,which are compiled with the 'broken' OpenBLAS, needs to be modified. 


Regards
Thomas Eylenbosch


-----Original Message-----
From: [email protected] <[email protected]> On 
Behalf Of Åke Sandgren
Sent: woensdag 8 mei 2019 20:05
To: [email protected]
Subject: Re: [easybuild] Openblas(foss) matrix issue



On 5/8/19 7:56 PM, Kenneth Hoste wrote:
> Thank you for reporting back on this Thomas!
> 
> It's good to hear that the issue can be resolved by using a newer 
> version of OpenBLAS, but it's also frustrating...
> 
> This is clearly a bug in OpenBLAS that could have been prevented.
> I haven't studied this issue in detail myself yet, but I have seen 
> comments pass by from OpenBLAS maintainers who say they don't have 
> Skylake hardware to test on.
> That makes me wonder how well the rest of OpenBLAS is tested, which is 
> a bit infuriating for a library that important.
> 
> On the EasyBuild side, I think we have a couple of options for mitigation:
> 
> 1) Add eaysconfigs for the latest version of OpenBLAS to the next 
> EasyBuild release (v3.9.1) which can be used to swap out the OpenBLAS 
> included in recent foss toolchains.
> I suspect simply doing a "module swap" to the newer OpenBLAS version 
> is sufficient in most cases (if OpenBLAS was not statically linked, 
> and if RPATH is not used).
> 
> 2) Modify the toolchain definition of foss/2018b (and foss/2019a?) to 
> use the newer OpenBLAS version.
> I'm not sure if this is too drastic or not, but it would be up to each 
> site to decide whether or not they want to update their already foss 
> modules to pick on this or not.
> 
> 3) Collect test programs/scripts/benchmarks in a central repository 
> (easybuild-testing?), so we can assess the stability of future 
> OpenBLAS versions that we consider for inclusion in the 'foss' toolchains.
> 
> You could state that this isn't our 'job', but if the OpenBLAS 
> maintainers are not capable of properly testing their releases on 
> recent hardware, then I guess it's our duty to try and catch problems 
> like this ourselves before they blow up in our faces weeks (or months) later.
> 
> Anyone who would be up for helping out with this?
> For now we should definitely focus on covering this OpenBLAS issue 
> well, but I can see this thing growing out as another central repo 
> where we pool together efforts done on testing/benchmarking on top of 
> modules installed with EasyBuild...
> 
> 
> I'm a bit surprised that these problems didn't arise earlier...
> foss/2018b has been defined a fairly long time ago (early July 2018), 
> and this toolchain has been picked up quite long (based on incoming 
> contributions).
> So why did these problems only start surfacing in recent weeks? Does 
> anyone have a plausible explanation?
> Note that I'm genuinely wondering here, I'm not trying to insinuate 
> anything...

One reason it haven't shown up earlier is that it mainly affects slightly 
larger matrices and maybe also only in certain circumstances.
That's the feeling i got from the user here that had problems.
small to medium sizes had no problem, only fairly large ones showed any 
problems...

And we would definitely help in testing...

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: [email protected]   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

Reply via email to