thank you Carlos! You did a great job figuring out this fix :)

I can confirm that after applying this patch in our cluster the issue seems
to be solved for us. Now we pass these tests with
"OpenBLAS/0.3.1-GCC-7.3.0-2.30":
https://github.com/eylenth/Openblas_matrix_issue
https://github.com/xianyi/BLAS-Tester

I also got a confirmation from a colleague in our user support team that a
problem he was trying to debug with some R code is solved after this fix
was applied.

I have sent a PR with the fix upstream:
https://github.com/easybuilders/easybuild-easyconfigs/pull/8396

In case anyone else test the workaround it would be nice if you report in
the mailing list or in the pull request in github if it's working fine for
you too.

regards,
Pablo


On Tue, May 28, 2019 at 2:32 PM Carlos Fenoy <[email protected]> wrote:

> Hi,
>
> After fighting a long time with this, we managed to get a solution that
> passes both the "Openblas_matrix_issue" and "BLAS_tester" test suites.
>
> To solve the issue we had to apply a patch and add a new build parameter
> (USE_SIMPLE_THREADED_LEVEL3=1) to OpenBLAS to make it work with multiple
> openmp threads.
>
> This is how the buildopts line looks like for us:
>
> buildopts = ' USE_SIMPLE_THREADED_LEVEL3=1 BINARY=64 USE_THREAD=1
> USE_OPENMP=1 CC="$CC" FC="$F77" DYNAMIC_ARCH=1'
>
> And the patch, we got it from this commit on the OpenBLAS repo:
> https://github.com/xianyi/OpenBLAS/commit/b14f44d2adbe1ec8ede0cdf06fb8b09f3c4b6e43
>  (you
> can get the patch by adding .patch at the end of the URL)
>
> Regards,
> Carlos
>
> On Mon, May 27, 2019 at 6:15 PM Pablo Escobar Lopez <
> [email protected]> wrote:
>
>> Hi,
>>
>> did anyone found a working patch or workaround for the matrix issue when
>> using OpenBLAS-0.3.1 ?
>>
>> After a lot of try&error I couldn't pass the tests in
>> https://github.com/eylenth/Openblas_matrix_issue when using
>> https://github.com/easybuilders/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/o/OpenBLAS/OpenBLAS-0.3.1-GCC-7.3.0-2.30.eb
>>  .
>> No matter what patches, toolchainopts or buildopts I use (and I have tried
>> few different combinations) . Is anyone able to pass the tests using
>> openblas-0.3.1 ?
>>
>> I could pass the tests using openblas-0.3.5 but upgrading my foss/2018b
>> toolchain would be quite messy because I use RPATH. The less intrusive
>> solution for my users would be to be able to patch openblas-0.3.1 somehow
>> but I couldn't find a working solution. Any suggestions?
>>
>> regards,
>> Pablo.
>>
>> p.s. in a related topic, IMHO unless there is a proper workaround I would
>> suggest to stop providing openblas-0.3.1 with easybuild. Right now we are
>> distributing a broken library
>>
>>
>> On Tue, May 7, 2019 at 6:34 PM Mikael Öhman <[email protected]> wrote:
>>
>>> Hi Thomas,
>>>
>>> I can also confirm these issues. I tried rebuilding OpenBLAS+R after the
>>> fix in #7180, but I still saw the same problems.
>>> Very large matrix-matrix multiplications randomly gave the wrong result.
>>> Very large errors. The larger the matrix, the more frequent the errors.
>>>
>>> In the end, I compiled an intel-version (but I had to remove a few
>>> extensions that didn't build) and removed my Foss version from our
>>> installations.
>>>
>>> Perhaps it's related to hardware; I saw this on happen skylake servers.
>>> I haven't had time to check if this
>>> https://github.com/easybuilders/easybuild-easyconfigs/issues/8197
>>> also affects 0.3.1
>>>
>>> Best regards, Mikael
>>>
>>>
>>> On Tue, May 7, 2019 at 6:12 PM Thomas Eylenbosch <
>>> [email protected]> wrote:
>>>
>>>> Hello
>>>>
>>>>
>>>>
>>>> Some of our end users reported a calculation issue with matrices when
>>>> they are working with a foss/2018b module
>>>>
>>>>
>>>>
>>>> I reproduced this error with Python and R that are compiled with the
>>>> foss/2018b toolchain, the output returns unexcepted results.
>>>>
>>>> Then I reproduced this error with Python and R that are compiled with
>>>> the foss/2016b toolchain  , then it gives me the expected behavior.
>>>>
>>>>
>>>>
>>>> You can reproduce this error with the following github repository:
>>>>
>>>> https://github.com/eylenth/Openblas_matrix_issue
>>>>
>>>>
>>>>
>>>> I have also tried to recompile the OpenBLAS-0.3.1-GCC-7.3.0-2.30.eb
>>>>  easyconfig file with “toolchainopts = {'vectorize': False}” ( cfr.
>>>> https://github.com/easybuilders/easybuild-easyconfigs/issues/7180)
>>>>
>>>> But is still giving me unexpected behavior
>>>>
>>>>
>>>>
>>>> Can someone try to reproduce the error with the R/Python(foss/2018b)
>>>> modules. Or can someone give me feedback on this?
>>>>
>>>>
>>>>
>>>> Thank you in advance.
>>>>
>>>>
>>>>
>>>> Met vriendelijke groet / Kind regards / Beste Grüße
>>>>
>>>> *Thomas Eylenbosch*
>>>>
>>>> *Ext: Gluo N.V.*
>>>>
>>>>
>>>>
>>>> BASF Agricultural Solutions Belgium NV
>>>>
>>>> Technologiepark 101
>>>>
>>>> B-9052 Ghent (Zwijnaarde)
>>>>
>>>> BELGIUM
>>>>
>>>> E-mail:  *[email protected]
>>>> <[email protected]>*
>>>>
>>>> [image: cid:[email protected]]
>>>>
>>>> BASF Agricultural Solutions Belgium NV, Registered Office: 9052 Gent,
>>>> Belgium
>>>>
>>>> Registration: RPR Gent: 0685.756.742
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Pablo Escobar López
>> Linux/HPC systems engineer
>> sciCORE, University of Basel
>> SIB Swiss Institute of Bioinformatics
>>
>

-- 
Pablo Escobar López
Linux/HPC systems engineer
sciCORE, University of Basel
SIB Swiss Institute of Bioinformatics

Reply via email to