Package: octave Version: 4.4.1-4 Severity: grave X-Debbugs-CC: debian-science@lists.debian.org, ido...@neto.net.il
Dear octave maintainer, I received an astonishing bug report[1] saying that MKL returns wrong result for matrix multiplication. However, my further investigation suggests that the problem is very likely a threading bug of octave. OpenMP is highly suspected according to my following experimental results. ======================================================================= Please reproduce the problem with the following octave script: 921193.m ``` x=1:100000; y=[x;x]*[x;x]'; disp(reshape(y, 1, 4)) ``` The correct result is 333338333350000 333338333350000 333338333350000 333338333350000 However sometimes octave yields random results, which is possibly a symptom of race condition between threads or alike. ------------------------ MKL ------------------------------------------ libblas.so, libblas.so.3 = libmkl_rt.so $ octave 921193.m 1.1033e+15 1.1033e+15 1.1038e+15 1.1038e+15 $ MKL_NUM_THREADS=1 octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 $ MKL_NUM_THREADS=2 octave 921193.m 642428448891136 642428448891136 642428448891136 642428448891136 $ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=intel octave 921193.m 489859913436624 489859913436624 488279025495504 488279025495504 $ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=gnu octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 $ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=tbb octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 $ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=sequential octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 The default threading model used by libmkl_rt is the "intel" threading aka. libiomp5 . It seems that Octave isn't happy with libiomp5 . In contrast, gomp (gnu), tbb, serial/sequential threading users are not affected. ----------------------- Netlib ------------------------------------------ libblas.so, libblas.so.3 = netlib blas $ octave 921193.m 824104476280848 824104476280848 828286951663952 828286951663952 $ octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 $ OMP_NUM_THREADS=1 octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 Netlib blas has no multi-thread implementation. This result indicates that octave's multi-threading functionality is questionable. ---------------------- BLIS (openmp) ------------------------------------ libblas.so, libblas.so.3 = blis (openmp). Please note that BLIS use single thread by default even if it's compiled with openmp/pthread threading model. $ octave 921193.m 757875851200128 757875851200128 796692912410048 796692912410048 $ BLIS_NUM_THREADS=1 octave 921193.m 531523819460688 531523819460688 543945552290544 543945552290544 $ OMP_NUM_THREADS=1 octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 Again, octave's multi-threading functionality is questionable. ------------------- OpenBLAS --------------------------------------- libblas.so, libblas.so.3 = openblas $ octave 921193.m 907773323384928 907773323384928 925793789579664 925793789579664 $ octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 $ OPENBLAS_NUM_THREADS=1 octave 921193.m 737565604371072 737565604371072 741382086552384 741382086552384 $ OMP_NUM_THREADS=1 octave 921193.m 333338333350000 333338333350000 333338333350000 333338333350000 ========================================================================= According to the above experimental results, I think BLAS libraries are innocent. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921193