On Tue, 2019-05-28 at 01:37 -0700, Mo Zhou wrote: > Hi Gentoo devs, > > Classical numerical linear algebra libraries, BLAS[1] and LAPACK[2] > play important roles in the scientific computing field, as many > software such as Numpy, Scipy, Julia, Octave, R are built upon them. > > There is a standard implementation of BLAS and LAPACK, named netlib > or simply "reference implementation". This implementation had been > provided by gentoo's main repo. However, it has a major problem: > performance. On the other hand, a number of well-optimized > BLAS/LAPACK > implementations exist, including OpenBLAS (free), BLIS (free), > MKL (non-free), etc., but none of them has been properly integrated > into the Gentoo distribution. > > I'm writing to propose a good solution to this problem. If no gentoo > developer is object to this proposal, I'll keep moving forward and > start submitting PRs to Gentoo main repo. > > Historical Obstacle > ------------------- > > Different BLAS/LAPACK implementations are expected to be compatible > to each other in both the API and ABI level. They can be used as > drop-in replacement to the others. This sounds nice, but the > difference > in SONAME hampered the gentoo integration of well-optimized ones. > > Assume a Gentoo user compiled a pile of packages on top of the > reference > BLAS and LAPACK, namely these reverse dependencies are linked against > libblas.so.3 and liblapack.so.3 . When the user discovered that > OpenBLAS provides much better performance, they'll have to recompile > the whole reverse dependency tree in order to take advantage from > OpenBLAS, > because the SONAME of OpenBLAS is libopenblas.so.0 . When the user > wants to try MKL (libmkl_rt.so), they'll have to recompile the whole > reverse dependency tree again. > > This is not friendly to our earth. > > Goal > ---- > > * When a program is linked against libblas.so or liblapack.so > provided by any BLAS/LAPACK provider, the eselect-based solution > will allow user to switch the underlying library without > recompiling > anything. > > * When a program is linked against a specific implementation, e.g. > libmkl_rt.so, the solution doesn't break anything. > > Solution > -------- > > Similar to Debian's update-alternatives mechanism, Gentoo's eselect > is good at dealing with drop-in replacements as well. My preliminary > investigation suggests that eselect is enough for enabling > BLAS/LAPACK > runtime switching. Hence, the proposed solution is eselect-based: > > * Every BLAS/LAPACK implementation should provide generic library > and eselect candidate libraries at the same time. Taking netlib, > BLIS and OpenBLAS as examples: > > reference: > > usr/lib64/blas/reference/libblas.so.3 (SONAME=libblas.so.3) > -- default BLAS provider > -- candidate of the eselect "blas" unit > -- will be symlinked to usr/lib64/libblas.so.3 by eselect > > usr/lib64/lapack/reference/liblapack.so.3 > (SONAME=liblapack.so.3) > -- default LAPACK provider > -- candidate of the eselect "lapack" unit > -- will be symlinked to usr/lib64/liblapack.so.3 by eselect > > blis (doesn't provide LAPACK): > > usr/lib64/libblis.so.2 (SONAME=libblis.so.2) > -- general purpose > > usr/lib64/blas/blis/libblas.so.3 (SONAME=libblas.so.3) > -- candidate of the eselect "blas" unit > -- will be symlinked to usr/lib64/libblas.so.3 by eselect > -- compiled from the same set of object files as libblis.so.2 > > openblas: > > usr/lib64/libopenblas.so.0 (SONAME=libopenblas.so.0) > -- general purpose > > usr/lib64/blas/openblas/libblas.so.3 (SONAME=libblas.so.3) > -- candidate of the eselect "blas" unit > -- will be symlinked to usr/lib64/libblas.so.3 by eselect > -- compiled from the same set of object files as > libopenblas.so.0 > > usr/lib64/lapack/openblas/liblapack.so.3 > (SONAME=liblapack.so.3) > -- candidate of the eselect "lapack" unit > -- will be symlinked to usr/lib64/liblapack.so.3 by eselect > -- compiled from the same set of object files as > libopenblas.so.0 > > This solution is similar to Debian's[3]. This solution achieves our > goal, > and it requires us to patch upstream build systems (same to Debian). > Preliminary demonstration for this solution is available, see below. > > Is this solution reliable? > -------------------------- > > * A similar solution has been used by Debian for many years. > * Many projects call BLAS/LAPACK libraries through FFI, including > Julia. > (See Julia's standard library: LinearAlgebra) > > Proposed Changes > ---------------- > > 1. Deprecate sci-libs/{blas,cblas,lapack,lapacke}-reference from > gentoo > main repo. They use exactly the same source tarball. It's not > quite > helpful to package these components in a fine-grained manner. A > single > sci-libs/lapack package is enough. > > 2. Merge the "cblas" eselect unit into "blas" unit. It is potentially > harmful when "blas" and "cblas" point to different > implementations. > That means "app-eselect/eselect-cblas" should be deprecated. > > 3. Update virtual/{blas,cblas,lapack,lapacke}. BLAS/LAPACK providers > will be registered in their dependency information. > > Note, ebuilds for BLAS/LAPACK reverse dependencies are expected to > work > with these changes correctly without change. For example, my local > numpy-1.16.1 compilation was successful without change. > > Preliminary Demonstration > ------------------------- > > The preliminary implementation is available in my personal > overlay[4]. > A simple sanity test script `check-cpp.sh` is provided to illustrate > the effectiveness of the proposed solution. > > The script `check-cpp.sh` compiles two C++ programs -- one calls > general > matrix-matrix multiplication from BLAS, while another one calls > general > singular value decomposition from LAPACK. Once compiled, this script > will switch different BLAS/LAPACK implementations and run the C++ > programs > without recompilation. > > The preliminary result is avaiable here[5]. (CPU=Power9, > ARCH=ppc64le) > From the experimental results, we find that > > For (512x512) single precision matrix multiplication: > * reference BLAS takes ~360 ms > * BLIS takes ~70 ms > * OpenBLAS takes ~10 ms > > For (512x512) single precision singular value decomposition: > * reference LAPACK takes ~1900 ms > * BLIS (+reference LAPACK) takes ~1500 ms > * OpenBLAS takes ~1100 ms > > The difference in computation speed illustrates the effectiveness of > the proposed solution. Theoretically, any other package could take > advantage from this solution without any recompilation as long as > it's linked against a library with SONAME. > > Acknowledgement > --------------- > This is an on-going GSoC-2019 Porject: > https://summerofcode.withgoogle.com/projects/?sp-page=2#6268942782300160 > > Mentor: Benda Xu > > [1] BLAS = Basic Linear Algebra Subroutines. It's a set of API + ABI. > [2] LAPACK = Linear Algebra PACKage. It's a set of API + ABI. > [3] https://wiki.debian.org/DebianScience/LinearAlgebraLibraries > [4] https://github.com/cdluminate/my-overlay > [5] > https://gist.github.com/cdluminate/0cfeab19b89a8b5ac4ea2c5f942d8f64 >
We already have such a solution in the sci-overlay. It has proven extremely brittle and shaky. The plan is to do this via USE flags similar to python-single-r1 flags. Optionally, we can leave a "virtual" USE flag, where users can switch implementation at runtime, but this will not be a supported configuration. While I understand that runtime-switching sounds like a great feature, it has proven too difficult to get right and too hard to enforce invariants on correct symlinks. People that want this can go the virtual+eselect approach in the overlay, but 99% of Gentoo users will be happy with just linking against OpenBLAS/reference-lapack and not having to fix weird stale symlinks that eselect-alternatives somehow lost track of. David See also: https://bugs.gentoo.org/632624 https://bugs.gentoo.org/348843#c30